Optimizing Database Performance, part 3: Free Space and Compression

This tip is excerpted from my book Database administration: The complete guide to DBA practices and procedures.

The first and second parts of this series are also available.

Today we start by discussing free space.

Free Space

Free space, sometimes called fill factor, can be used to leave a portion of a table space or index empty and available to store newly added data. The specification of free space in a table space or index can reduce the frequency of reorganization, reduce contention, and increase the efficiency of access and insertion. Each DBMS provides a method of specifying free space for a database object in the CREATE and ALTER statements. A typical parameter is PCTFREE, where the DBA specifies the percentage of each data page that should remain available for future inserts. Another possible parameter is FREEPAGE, where the DBA indicates the specified number of pages after which a completely empty page is available.

Ensuring a proper amount of free space for each database object provides the following benefits:

  • Inserts are faster when free space is available.
  • As new rows are inserted, they can be properly clustered.
  • Variable-length rows and altered rows have room to expand, potentially reducing the number of relocated rows.
  • Fewer rows on a page results in better concurrency because less data is unavailable to other users when a page is locked.

However, free space also has several disadvantages.

  • Disk storage requirements are greater.
  • Scans can take longer.
  • Fewer rows on a page can require more I/O operations to access the requested information (when sequentially accessing many rows).
  • Because the number of rows per page decreases, the efficiency of data caching can decrease because fewer rows are retrieved per I/O.

The DBA should monitor free space and ensure that the appropriate amount is defined for each database object. The correct amount of free space must be based on:

  • Frequency of inserts and modifications
  • Amount of sequential versus random access
  • Impact of accessing unclustered data
  • Type of processing
  • Likelihood of row chaining, row migration, and page splits

Don’t define free space for a static table — that is not going to grow in size. Static tables do not need room in which to expand.

Compression

Another useful database performance tuning tactic is to deploy data compression.

Compression can be used to shrink the size of a database. By compressing data, the database requires less disk storage. Some DBMSs provide internal DDL options to compress database files; third-party software is available for those that do not provide such features.

When compression is specified, data is algorithmically compressed upon insertion into the database and decompressed when it is read. Reading and writing compressed data consumes more CPU resources than reading and writing uncompressed data: The DBMS must execute code to compress and decompress the data as users insert, update, and read the data.

So why compress data? Consider an uncompressed table with a row size of 800 bytes. Five of this table’s rows would fit in a 4K data page (or block). Now what happens if the data is compressed? Assume that the compression routine achieves 30% compression on average (a very conservative estimate). In that case, the 800-byte row will consume only 560 bytes (800 x 0.30 = 560). After compressing the data, seven rows will fit on a 4K page. Because I/O occurs at the page level, a single I/O will retrieve more data, which will optimize the performance of sequential data scans and increase the likelihood of data residing in the cache (or buffer pool) because more rows fit on a physical page.

Of course, compression requires a trade-off that the DBA must analyze. On the positive side, we have disk savings and the potential for reducing I/O cost. On the negative side, we have the additional CPU cost required to compress and decompress the data. Depending on the DBMS and hardware being used though, the CPU consumption may be nominal or not worth considering. For example, DB2 for z/OS takes advantage of an on-board chip for database compression operations which does not require taxing the general CPU and is very efficient.

However, compression is not an option for every database index or table. For smaller amounts of data, it is possible that a compressed file will be larger than an uncompressed file. This is so because some DBMSs and compression algorithms require an internal dictionary to manage the compression. The dictionary contains statistics about the composition of the data that is being compressed. For a trivial amount of data, the size of the dictionary may be greater than the amount of storage saved by compression.

Summary

Free space and compression are two additional tools in the DBA’s arsenal for optimizing and controlling database performance. Understanding how each is handled for your specific DBMS will allow you to become a better DBA and improve the performance of your databases, applications, and systems.

This series of blog posts is excerpted from Craig Mullins’ classic book Database administration: The complete guide to DBA practices and procedures. Consult the book for more in-depth information on any of these topics – and much, much more

Advertisements

About craig@craigsmullins.com

I'm a strategist, researcher, and consultant with nearly three decades of experience in all facets of database systems development.
This entry was posted in DBA and tagged , , . Bookmark the permalink.

5 Responses to Optimizing Database Performance, part 3: Free Space and Compression

  1. Pingback: DB2 Hub | Optimizing Database Performance, part 3: Free Space and Compression

  2. Regarding free space for a static table: The example I always like to give is a table containing the 50 states of the USA. That is not going to change very frequently. And if/when we add that next state. I guarantee you’ll get some downtime to fit that new row in!

  3. Pingback: Optimizing Database Performance, part 4: File Placement and Page Sizing | Data and Technology Today

  4. Pingback: Optimizing Database Performance, part 5: Database Reorganization | Data and Technology Today

  5. Pingback: Optimizing Database Performance, part 6: Automation | Data and Technology Today

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s