Portions of this tip were excerpted from my book Database administration: The complete guide to DBA practices and procedures (2nd edition).
Today we start by discussing file placement and data set allocation.
File Placement and Allocation
The location of the files containing the data for the database can have an impact on performance. A database is very I/O intensive, and the DBA must make every effort to minimize the cost of physical disk reading and writing.
This discipline entails
- Understanding the access patterns associated with each piece of data in the system
- Placing the data on physical disk devices in such a way as to optimize performance
The first consideration for file placement on disk is to separate the indexes from the data, if possible. Database queries are frequently required to access data from both the table and an index on that table. If both of these files reside on the same disk device, performance degradation is likely. To retrieve data from disk, an arm moves over the surface of the disk to read physical blocks of data on the disk. If a single operation is accessing data from files on the same disk device, latency will occur; reads from one file will have to wait until reads from the other file are processed. Of course, if the DBMS combines the index with the data in the same file, this technique cannot be used.
Another rule for file placement is to analyze the access patterns of your applications and separate the files for tables that are frequently accessed together. DBAs should do this for the same reason they should separate index files from table files.
A final consideration for placing files on separate disk devices occurs when a single table is stored in multiple files (partitioning). It is wise in this case to place each file on a separate disk device to encourage and optimize parallel database operations. If the DBMS can break apart a query to run it in parallel, placing multiple files for partitioned tables on separate disk devices will minimize disk latency.
With modern disk systems such as RAID devices, precise file placement is often difficult, if not impossible, to achieve. RAID storage consists of arrays of disk drives, and the files stored on the device are split across multiple physical disk drives. Be sure to read the documentation for your specific DBMS and follow the guidance for the placement of critical system files (such as log and configuration files). And keep in mind that a hard disk drive with two partitions, one holding log files and the other partition holding data files, is not the same as two disks, each dedicated to either log or data files.
Database Transaction Log Placement
Placing the transaction log on a separate disk device from the actual data allows the DBA to back up the transaction log independently from the database. It also minimizes dual writes to the same disk. Writing data to two files on the same disk drive at the same time will degrade performance even more than reading data from two files on the same disk drive at the same time. Remember, too, that every database modification (write) is recorded on the database transaction log.
Distributed Data Placement
The goal of data placement is to optimize access by reducing contention on physical devices. Within a client/server environment, this goal can be expanded to encompass the optimization of application performance by reducing network transmission costs.
Data should reside at the database server where it is most likely, or most often, to be accessed. For example, Chicago data should reside at the Chicago database server, Los Angeles–specific data should reside at the Los Angeles database server, and so on. If the decision is not so clear-cut (e.g., San Francisco data if there is no database server in San Francisco), place the data on the database server that is geographically closest to where it will be most frequently accessed (in the case of San Francisco, L.A., not Chicago).
Be sure to take fragmentation, replication, and snapshot tables into account when deciding upon the placement of data in your distributed net-work.
The DBMS may require disk devices to be allocated for database usage. If this is the case, the DBMS will provide commands to initialize physical disk devices. The disk initialization command will associate a logical name for a physical disk partition or OS file. After the disk has been initialized, it is stored in the system catalog and can be used for storing table data.
Before initializing a disk, verify that sufficient space is available on the physical disk device. Likewise, make sure that the device is not already initialized.
Use meaningful device names to facilitate more efficient usage and management of disk devices. For example, it is difficult to misinterpret the usage of a device named DUMP_DEV1 or TEST_DEV7. However, names such as XYZ or A193 are not particularly useful. Additionally, maintain documentation on initialized devices by saving script files containing the actual initialization commands and diagrams indicating the space allocated by device.
Page Size (Block Size)
Sizing database blocks (or pages) is another important method that can be used to control and manage database and application performance.
Most DBMSs provide the ability to specify a page, or block, size. The page size is used to store table rows (or, more accurately, records that contain the row contents plus any overhead) on disk. For example, consider a table requiring rows that are 125 bytes in length with 6 additional bytes of overhead. This makes each record 131 bytes long. To store 25 records on a page, the page size would have to be at least 3,275 bytes. However, each DBMS requires some amount of page overhead as well, so the practical size will be larger. If page overhead is 20 bytes, the page size would be 3,295—that is, 3,275 + 20 bytes of overhead.
This discussion, however, is simplistic. In general practice, most table spaces will require some amount of free space to accommodate new data. Therefore, some percentage of free space will need to be factored into the equation.
To complicate matters, many DBMSs limit the page sizes that can be chosen. For example, DB2 for z/OS limits page size to 4K, 8K, 16K, or 32K; Microsoft SQL Server supports only an 8K page size. In the case of DB2, the DBA will need to calculate the best page size based on row size, the number of rows per page, and free space requirements.
Consider this question: “In DB2 for z/OS, what page size should be chosen if 0 percent free space is required and the record size is 2,500 bytes?”
The simplistic answer is 4K, but it might not be the best answer. A 4K page would hold one 2,500-byte record per page, but an 8K page would hold three 2,500-byte records. The 8K page could provide for more efficient sequential I/O processing, because reading 8K of data would return three rows, whereas reading 8K of data using two 4K pages would return only two rows.
Choosing the proper page size is an important DBA task for optimizing database I/O performance.
Page/block size and file/data set allocation can be important criteria for DBAs as they optimize and manage their database systems. Understanding the options available for your specific DBMS will allow you to become a better DBA and improve the performance of your databases, applications, and systems.
This series of blog posts is excerpted from Craig Mullins’ classic book Database administration: The complete guide to DBA practices and procedures. Consult the book for more in-depth information on any of these topics – and much, much more