today’s blog post we will examine some rules of thumb that apply generally to SQL development regardless of the underlying DBMS. These are the general guiding principles by which your SQL development should be guided…
Rule 1: “It Depends!”
The answer to every question about database performance is “It depends.” A successful DBA (and programmer) will know on what it depends. For example, if someone asks, “What is the best access path for my SQL query?” the best answer is “It depends.” Why? Well, that is more difficult to answer.
If every row must be returned, a table scan is likely to be more efficient than indexed access. However, if only one row is to be returned, direct index lookup will probably perform best. For queries that return between one and all rows, the performance of access paths will depend on how the data is clustered, which version of the DBMS is in use, whether parallelism can be invoked, and so forth.
Be skeptical of tuning tips that use the words “always” or “never.” Just about everything depends on other things.
Rule 2: Be Careful What You Ask For
The arrangement of elements within a query can change query performance. To what degree depends on the DBMS in use and whether rule-based optimization is used.
A good rule of thumb, regardless of DBMS, is to place the most restrictive predicate where the optimizer can read it first. In Oracle, the optimizer reads WHERE clauses from the bottom up, therefore, the most restrictive predicate should be put at the bottom of the query. It is just the opposite in DB2.
Placing the most restrictive predicate where the optimizer can read it first enables the optimizer to narrow down the first set of results before proceeding to the next predicate. The next predicate will be applied to the subset of data that was selected by the most selective condition, instead of against the entire table.
And keep in mind that these things can change from release to release of a DBMS, so keep up with each new version, release and fixpack to make sure you understand what has been changed and how it might impact your SQL.
Rule 3: KISS
A rule of thumb for all types of IT activities is to follow the KISS principle: Keep it simple, Stupid. However, in the world of SQL there is a trade-off between simplicity and performance.
Keeping SQL simple makes development and maintenance tasks easier. A simple SQL statement is easier to decipher and easier to change. With simple SQL, application developers can perform their job more easily than with complex SQL.
Nevertheless, complex SQL can outperform simple SQL. The more work that can be performed by the DBMS and the optimizer, the better performance is likely to be. Let’s look at an example: Some programmers avoid joins by coding multiple SQL SELECT statements and joining the data using program logic. The SQL is simpler because the programmer need not understand how to write SQL to join tables. However, SQL joins usually outperform program joins because less data is returned to the program.
Furthermore, the relational optimizer can change the join methodology automatically if the database or data changes. Conversely, program logic must be changed manually by a skilled programmer.
Rule 4: Retrieve Only What is Needed
As simple as this rule of thumb sounds, you might be surprised at how often it is violated. To minimize the amount of data returned by your SQL statements, be sure to specify the absolute minimum number of columns in the SELECT list. If the column is not needed to satisfy the business requirement, do not request it to be returned in the result set.
Specify the absolute minimum number of columns in the SELECT list.
Programmers frequently copy SQL statements that work well to use as templates for new statements. Sometimes the programmer will forget to trim down the number of columns requested when they only need a subset of the columns in the original query. This can adversely impact performance. The more columns that must be returned by the DBMS, the greater the processing overhead.
Another common problem is requesting unnecessary data. Consider the following SQL statement:
SELECT position, last_name, empno
WHERE last_name = 'SMITH';
There is no reason to specify the last_name column in the SELECT list of this SQL statement. We know that last_name must be ‘SMITH’ for the entire result set because of the WHERE clause.
Returning only what is needed does not apply only to columns. You should also minimize the number of rows to be returned by coding the proper WHERE clauses for every SQL statement. The more data that can be filtered out of the result set by the DBMS, the more efficient the query will be because less data must be returned to the requester.
Sometimes application programmers avoid coding appropriate WHERE clauses in a misguided attempt to simplify SQL statements. The more information the optimizer has about the data to be retrieved, the better the access paths it formulates will be. A sure sign of potential abuse is finding a SQL statement embedded in an application program that is immediately followed by a series of IF-THEN-ELSE statements. Try to tune the query by moving the IF-THEN-ELSE statements into SQL WHERE clauses.
And so concludes the first four of our 12 SQL rules of thumb… tune in next time for the next 4 rules (numbers 5 thru 8) in our dozen guidelines for developing SQL that works and performs well…