Applications that access databases are only as good as the performance they achieve. And every user wants their software to run as fast as possible. As such, performance tuning and management is one of the biggest demands on the DBA’s time. When asked what is the single most important or stressful aspect of their job, DBAs typically respond “assuring optimal performance.” Indeed, analyst surveys regularly indicate that “performance and troubleshooting” tops the list of most challenging DBA tasks.
But when you are dealing with data in a database management system there are multiple interacting components that must be managed and tuned to achieve optimal performance. That is, every database application, at its core, requires three components to operate:
- the system (that is, the DBMS itself, the network, and the O/S),
- the database (that is, the DDL and database schema), and
- the application (that is, the SQL and program logic).
Each of these components requires care and attention, but today I want to focus on the high-level aspects of performance management from the perspective of the application.
So where do we begin? A logical starting point is with program parameters, such as the package bind parms. There are many parameters and values that must be chosen from and specified when you bind an application program. The vast array of options at our disposal can render the whole process extremely confusing – especially if you don’t bind on a daily basis. And even if you do, some of the options still might be confusing if you rarely have to change them. I’m talking about parameters like ACQUIRE, RELEASE, VALIDATE, ISOLATION and DEGREE.
I will not delve into the myriad bind options and give you advice on which to use when. There are many articles and books, as well as DBMS manuals that you can use to guide you along that path. Suffice it to say, that there are some standard parameters and values that should be chosen “most of the time.” As such, a wise DBA group will set up canned routines for the programmers to use for compiling and binding their applications. Choices such as: “transaction”, “batch”, or “analytical query” can be presented to the developer and then, based on which of the various types of programs and environments that are available, the canned script can choose the proper bind options. Doing so can greatly diminish the problems that can be encountered when the “wrong” parameters or values are chosen when preparing programs for production.
Before concluding this short section on parameters consider this one important piece of advice: In production, always Bind your programs specifying EXPLAIN YES. This option will externalize the access path choices made by the optimizer at bind time. Failing to do so means that access paths will be generated, but you will not know what they are. This is akin to blinding yourself to what is going on and it makes application performance tuning much more difficult.
Access Path Management
There are several common approaches to access path management. The best approach is to Rebind your applications over time as the data changes. This approach involves some form of regular maintenance that keeps database statistics up to date and formulates new access paths as data volumes and patterns change.
Other approaches include Rebinding only when a new version of the DBMS is installed, or perhaps more ambitious, whenever new maintenance or fixpacks are applied to the DBMS. Yet another approach is to Rebind automatically after a regular period of time, whether it is days, weeks, months, or whatever period of time you deem significant. This approach can work if the period of time is wisely chosen based on the application data – but it still can pose significant administrative issues.
The final approach – the worst of the bunch – comes from the “if it ain’t broke don’t fix it” school of thought. Basically, it boils down to (almost) never Rebinding your programs. This approach penalizes every program for fear that a single program (or two) might experience a degraded access path. Oh, the possibility of degraded performance is real and that is why this approach has been adopted by some. And it can be difficult to find which statements may have degraded after a Rebind. The ideal situation would allow us to review the access path changes before hand to determine if they are better or worse.
Anyway, let’s go back to the best approach again, and that is to Rebind regularly as your data changes. This involves what is known as the three Rs: REORG, RUNSTATS, and Rebind. Your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that the DBMS is accessing data based upon false assumptions.
By Rebinding you will generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. And as you apply changes to the DBMS, optimizer improvements and new access techniques can be used. If you never Rebind, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to the DBMS itself.
To adopt the Three R’s you need to determine when to REORG. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, perhaps we need 4 R’s:
1) RUNSTATS or RTS
But is this enough? Probably not because we need to review the access paths after rebinding to make sure that there are no rogue access paths. So, let’s add another “r” – to Review the access paths generated by the REBIND. As we mentioned, the optimizer can make mistakes. And, of course, so can you. Users don’t call you when performance is better (or the same). But if performance gets worse, you can bet on getting a call from irate users.
So we need to put in place best practices whereby we test Rebind results to compare the before and after impact of the optimizer’s choices. Only then can we assure that we are achieving optimal application performance.
Tuning the Code
Of course, everything we’ve discussed so far assumes that the code is written efficiently to begin with – and that is a big assumption. We also need to make sure that we are implementing efficient application code. The application code consists of two parts: the SQL code and the host language code in which the SQL is embedded.
SQL is simple to learn and easy to start using. But SQL tuning and optimization is an art that takes years to master. Some general rules of thumb for creating efficient SQL statements include:
- Let SQL do the work instead of the application program. For example, code an SQL join instead of two cursors and a programmatic join.
- Simpler is generally better, but complex SQL can be very efficient.
- Retrieve only the columns required, never more.
- Retrieve the absolute minimum number of rows by specifying every WHERE clause that is appropriate.
- When joining tables, always provide join predicates. In other words, avoid Cartesian products.
- Favor using Stage 1 and Indexable predicates.
- Avoid sorting if possible by creating indexes for ORDER BY, GROUP BY, and DISTINCT operations.
- Avoid black boxes – that is, avoid I/O routines that are called by programs instead of using embedded SQL.
- Avoid deadlocks by updating tables in the same sequence in every program.
- Issue data modification statements (INSERT, UPDATE, DELETE) as close as possible to the COMMIT statement as possible.
- Be sure to build a COMMIT strategy into every batch program that changes data. Failing to COMMIT can cause locking problems.
To tune SQL the DBA must be able to interpret the output of the access paths produced by EXPLAIN. This information is encoded in the explain tables, with the primary one usually called PLAN_TABLE. There are vendor tools to simplify this process that render access paths visually and give recommendations on tuning approaches – and these tools are well worth considering.
Finally, some attention must be paid to the host language code. Host language code refers to the application programs written in C, COBOL, Java, or the programming language du jour. SQL statements are usually embedded into host language code and it is quite possible to have finely tuned SQL inside of inefficient host language code. And, of course, that would cause a performance problem.
The Bottom Line
Although DBAs must understand all three aspects of database performance management concentrating on the application aspects of performance will most likely provide the most bang-for-the-buck. Of course, we have only touched the tip of the application performance iceberg today. But even this high-level view into application performance can serve as a nice starting place for tuning your database applications.
Good luck and happy performance tuning!