I get involved in Q+As in many ways, including at on-line discussion boards, blogs, e-mails, and so on. Sometimes I like to share some of the questions (and answers) from some of these.
One question came from an individual who was having trouble understanding locking. The question was this:
We have an issue in our OLTP and Datamart 24/7 design (both on DB2 UDB on AIX). While running ETL against the OLTP database, if the ETL process uses “Cursor Stability” isolation, it potentially locks out other users. So we changed isolation to “Uncommitted Read.” This added a new problem: We started seeing duplicate records because some uncommitted records showed up in the result set.
We want to read all committed rows but we also do not want the ETL transaction to obtain any locks whatsoever so that other users do not get affected.
And here is my answer:
You are asking for the impossible. Basically, what you are asking is the equivalent of this “I want a lock on my shed so my neighbors don’t steal my lawnmower, but I don’t want to have to open the lock myself when I want to get in.”
Let’s review the concept of the isolation level. The following discussion is culled from my book Database Administration: The Complete Guide to Practices and Procedures.
The isolation level specifies the locking behavior for a transaction or statement. Standard SQL defines four isolation levels that can be set using the SET TRANSACTION ISOLATION LEVEL statement. (You can control the isolation level in DB2 using the BIND command, too.)
- Read uncommitted
- Read committed
- Repeatable read
The preceding list is an ordered list, progressing from lowest to highest isolation. The higher the isolation level, the more strict the locking protocol becomes. The lower the isolation level, the shorter lock duration will be. Additionally, each higher isolation level is a superset of the lower levels. Let’s briefly examine each of the standard isolation levels.
Specifying read uncommitted isolation implements read-through locks and is sometimes referred to as a dirty read. (This is the UR isolation level in DB2.) It applies to read operations only. With this isolation level data may be read that never actually exists in the database, because the transaction can read data that has been changed by another process, but is not yet committed. Read uncommitted isolation provides the highest level availability and concurrency of the isolation levels, but the worst degree of data integrity. It should be used only when data integrity problems can be tolerated. Certain types of applications, such as those using analytical queries, estimates and averages, are likely candidates for read uncommitted locking. But be careful. A dirty read can cause duplicate rows to be returned where none exist or no rows may be returned when one (or more) actually exists. When choosing read uncommitted isolation the programmer and DBA must ensure that these types of problems are acceptable for the application.
The read committed isolation level, also called cursor stability, (CS in DB2) provides more integrity than read uncommitted. When read committed is specified, the transaction will never read data that is not yet committed; only committed data can be read.
With an isolation level of repeatable read, a further restriction is placed on reads, namely the assurance that the same data can be accessed multiple times during the course of the transaction without its value changing. The previous two isolation levels (read uncommitted and read committed) permit the underlying data to change if it is accessed more than once. Use repeatable read only when data can be read multiple different times during the course of the transaction and the data values must be consistent.
Finally, the greatest integrity is provided by the serializable isolation level. Serializable isolation removes the possibility of phantoms. A phantom occurs when the transaction opens a cursor that retrieves data and subsequently another process inserts a value that would satisfy the request and should be in the result set. For example, consider the following situation:
- Transaction1 opens a cursor and reads account information, keeping a running sum of the total balance for the selected accounts.
- Transaction2 inserts a new account that falls within the range of accounts being processed by Transaction1, but the insert occurs after Transaction1 has passed the new account.
- Transaction2 commits the insert.
- Transaction1 runs a query to sum the values to check the accuracy of the running total. But the totals will not match.
Serializable isolation eliminates this problem. Phantoms can occur for lower isolation levels, but not when the isolation level is serializable.