Portions of this tip were excerpted from my book Database administration: The complete guide to DBA practices and procedures (2nd edition).
Today’s topic is reorganization.
Relational technology and SQL make data modification easy. Just issue an INSERT, UPDATE, or DELETE statement with the appropriate WHERE clause and the DBMS takes care of the actual data navigation and modification. In order to provide this level of abstraction, the DBMS handles the physical placement and movement of data on disk. Theoretically, this makes everyone happy. The programmer’s interface is simplified, and the DBMS takes care of the hard part—manipulating the actual placement of data. However, things are not quite that simple. The manner in which the DBMS physically manages data can cause subsequent performance issues.
Every DBA has encountered the situation where a query or application that used to perform well slows down after it has been in production for a while. These slowdowns have many potential causes—perhaps the number of transactions issued has increased, or the volume of data has expanded. However, the performance problem might be due to database disorganization. Database disorganization occurs when a database’s logical and physical storage allocations contain many scattered areas of storage that are too small, not physically contiguous, or too disorganized to be used productively. Let’s review the primary culprits. Query or application slowdowns have many potential causes:
- The first possibility is unclustered data. If the DBMS does not strictly enforce clustering, a clustered table or index can become unclustered as data is added and changed. If the data becomes significantly unclustered, the DBMS cannot rely on the clustering sequence. Because the data is no longer clustered, queries that were optimized to access data cannot take advantage of the clustering sequence. In this case, the performance of queries run against the unclustered table will suffer.
- Fragmentation is a condition in which there are many scattered areas of storage in a database that are too small to be used productively. It results in wasted space, which can hinder performance because additional I/Os are required to retrieve the same data.
- Row chaining or row migration occurs when updated data does not fit in the space it currently occupies, and the DBMS must find space for the row. With row chaining, the DBMS moves a part of the new, larger row to a location within the tablespace where free space exists. With row migrations, the full row is placed elsewhere in the tablespace. In each case, a pointer is used to locate either the rest of the row or the full row. Both row chaining and row migration will result in the issuance of multiple I/Os to read a single row. Performance will suffer because multiple I/Os are more expensive than a single I/O.
- Page splits can cause disorganized databases, too. If the DBMS performs monotonic page splits when it should perform normal page splits, or vice versa, space may be wasted. When space is wasted, fewer rows exist on each page, causing the DBMS to issue more I/O requests to retrieve data. Therefore, once again, performance suffers.
- File extents can negatively impact performance. An extent is an additional file that is “connected” to the original file and can be used only in conjunction with the original file. When the file used by a tablespace runs out of space, an extent is added for the file to expand. However, file extents are not stored contiguously with the original file. As additional extents are added, data requests will need to track the data from extent to extent, and the additional code this requires is unneeded overhead. Resetting the database space requirements and reorganizing can clean up file extents.
Depending on the DBMS, there may be additional causes of disorganization. For example, if multiple tables are defined within a single tablespace, and one of the tables is dropped, the tablespace may need to be reorganized to reclaim the space.
The need to access a table in a tablespace having any of these issues can cause a previously efficient application to perform worse.
Reorganizaing Disorganized Tablespaces
To correct disorganized database structures, the DBA can run a database or tablespace reorganization utility, or REORG, to force the DBMS to restructure the database object, thus removing problems such as unclustered data, fragmentation, row chaining, etc. The primary benefit of reorganization is the resulting speed and efficiency of database functions because the data is organized in a more optimal fashion on disk. In short, reorganization maximizes availability and reliability for databases.
Both tablespaces and indexes can be reorganized. How the DBA runs a REORG utility depends on the specific DBMS in use. Some DBMS products ship with a built-in reorganization utility; others require the customer to purchase separate software or “roll their own.” Still others claim that the customer will not need the utility at all when using their DBMS. I have found the last claim to be untrue. Every DBMS incurs some degree of disorganization as data is added and modified.
Figure 1. Typical steps for a manual reorganization.
Of course, depending upon the DBMS and the specific options chosen for the object additional steps (such as decompression/recompression and/or sorting) may be necessary.
If a utility is available for reorganizing, from either the DBMS vendor or a third-party vendor, the process is greatly simplified. Sometimes the utility is as simple as issuing a command such as:
REORG TABLESPACE DB.TSNAME
A traditional reorganization requires the database (or at least the database object) to be offline. The high cost of downtime creates pressure both to perform and to delay preventive maintenance—a no-win situation familiar to most DBAs. Most modern REORG utilities are available that perform the reorganization while the database structures are online. Such a reorganization is accomplished by making a copy of the data. The online REORG utility reorganizes the copy while the original data remains online. When the copied data has been reorganized, an online REORG uses the database log to “catch up” by applying to the copy any data changes that occurred during the process. When the copy has caught up to the original, the online REORG switches the production tablespace from the original to the copy. Performing an online reorganization requires additional disk storage and a slow transaction window. If a large number of transactions occur during the online reorganization, REORG may have a hard time catching up.
Today, there are some next generation utilities being introduced that eliminate steps and adjust the manner in which the reorganization occurs based on the database object being reorged. Such modern software may be required for high availability, large volume applications.
Determining When to Reorganize
System catalog statistics can help to determine when to reorganize a database object. Each DBMS provides a method of reading through the contents of the database and recording statistical information about each database object. Depending on the DBMS, this statistical information is stored either in the system catalog or in special pages within the database object itself.
One statistic that can help a DBA determine when to reorganize is cluster ratio. Cluster ratio is the percentage of rows in a table that are actually stored in clustering sequence. The closer the cluster ratio is to 100 percent, the more closely the actual ordering of the rows on the data pages matches the clustering sequence. A low cluster ratio indicates bad clustering, and a reorganization may be required. A low cluster ratio, however, may not be a performance hindrance if the majority of queries access data randomly instead of sequentially.
Tracking down the other causes of disorganization can sometimes be difficult. Some DBMSs gather statistics on fragmentation, row chaining, row migration, space dedicated to dropped objects, and page splits; others do not.
One example: Oracle provides a plethora of statistics in dynamic performance tables that can be queried in a special group of tables called the Oracle Dynamic Performance Tables.
Tablespaces are not the only database objects that can be reorganized. Indexes, too, can benefit from reorganization. As table data is added and modified, the index too must be changed. Such changes can cause the index to become disorganized.
A vital index statistic to monitor is the number of levels. Most relational indexes are b-tree structures. As data is added to the index, the number of levels of the b-tree will grow. When more levels exist in the b-tree, more I/O requests are required to move from the top of the index structure to the actual data that must be accessed. Reorganizing an index can cause the index to be better structured and require fewer levels.
Another index statistic to analyze to determine wheher reorganization is required is the distance between the index leaf pages, or leaf distance. Leaf distance is an estimate of the average number of pages between successive leaf pages in the index. Gaps between leaf pages can develop as data is deleted from an index or as a result of page splitting. Of course, the best value for leaf distance is zero, but achieving a leaf distance of zero in practice is not realistic. In general, the lower this value, the better. Review the value over time to determine a high-water mark for leaf distance that will indicate when indexes should be reorganized.
Database reorganization is a very important component in the DBA’s aresenal of performance tools. Taking the time to understand how your DBMS supports reorganization will be an important part of the overall job of maintaining optimal database/application performance.
This series of blog posts is excerpted from Craig Mullins’ classic book Database administration: The complete guide to DBA practices and procedures. Consult the book for more in-depth information on any of these topics – and much, much more.