A Dozen SQL Rules of Thumb, Part 2

Catch Phrases 47

Today’s blog post picks up where we left off in our three-part series of rules of thumb (ROTs) that apply generally to SQL development regardless of the underlying DBMS.

These are the general guiding principles by which your SQL development should be guided… and we start off today’s post with rule #5…

Rule 5: Avoid Cartesian Products

Be sure to code predicates matching the columns of every table that will be joined within each SQL statement. Failure to do so will result in performance degradation and possibly incorrect results.

Whenever predicates do not exist when you are joining two tables, the RDBMS must perform a Cartesian product. This is the combination of every row of one table with every row of the other table. Non-matching rows are not eliminated because there is nothing that can be matched. The results of a Cartesian product are difficult to interpret and contain no information other than a simple list of all rows of each table munged together.

Of course, there are exceptions to this rule where you are specifically coding a Cartesian product for a business or technical reason. When coding a Cartesian product on purpose always make sure that you have investigated other methods of producing the same results and tested each method for performance and accuracy… and be sure to specifically test any query with a Cartesian product using production volumes of data!

Rule 6: Judicious Use of OR

The OR logical operator can be troublesome for performance. If you can convert a SQL statement that uses OR to one that uses IN, it is likely that performance will improve. For example, consider changing this SQL statement:

 SELECT e.position, e.last_name, e.empno, d.manager
 FROM   employee e,
        department d
 WHERE  d.dept_id = e.dept_id
 AND    position = ‘MANAGER’
 OR     position = ‘DIRECTOR’
 OR     position = ‘VICE PRESIDENT’
 ORDER BY position;

to this:

 SELECT e.position, e.last_name, e.empno, d.manager
 FROM   employee e,
        department d
 WHERE  d.dept_id = e.dept_id
 AND    position IN (‘MANAGER’, ‘DIRECTOR’, ‘VICE PRESIDENT’)
 ORDER BY position;

Of course, your results may vary depending on the DBMS in use and the nature of the data.

Rule 7: Judicious Use of LIKE

The LIKE logical operator is another troublesome beast. It is very easy to create performance problems when using LIKE in SQL. For example, consider the following SQL statement:

 SELECT position, last_name, empno
 FROM   employee
 WHERE  dept_id LIKE ‘%X’
 ORDER BY position;

This query will return employee information for all employees working in any department where dept_id ends in ‘X’. However, the relational optimizer will have to scan the data in order to resolve this query—there is no way to use an index. Because the high-order portion of the column is not known, traversing a b-tree index structure is impossible.

You might be able to use your knowledge of the data to rewrite this query without a leading wild-card character (%). For example, perhaps all dept_id values start with either ‘A’ or ‘B’. In that case, you could modify the SQL as follows:

 SELECT position, last_name, empno
 FROM   employee
 WHERE  dept_id LIKE ‘A%X’
 OR     dept_id LIKE ‘B%X’
 ORDER BY position;

 

In this case, the DBMS may be able to use a non-matching index scan if an index exists on the dept_id column. Scanning the index may be more efficient than scanning the entire table, but even then you will be scanning data, which is not usually very efficient.

Once again, your results will vary with the DBMS in use and the nature of the data accessed.

Rule 8: Avoid Sorts When Possible

Sorting data is an inhibitor of optimal performance in SQL queries. Your DBMS will sort data as needed to satisfy your database requests. The types of operations that usually require sorting of some form are ORDER BY, GROUP BY, DISTINCT, UNION, INTERSECT, and EXCEPT. When performance is important, remember to look for sorts and find ways to eliminate them. You might be able to create an index to avoid sorting, or to use an alternate syntax if duplicate elimination is not important (e.g. UNION ALL versus UNION).

Keep in mind that sorting is an I/O intensive operation and can degrade query performance, sometimes significantly. When performance is important, remember to look for sorts and find ways to eliminate them.

Conclusion

And so we come to the end of part 2 in our 3 part series offering up 12 SQL rules of thumb… tune in next time for the final 4 rules (numbers 9 thru 12) in our dozen guidelines for developing effective and efficient SQL!

Advertisements

About craig@craigsmullins.com

I'm a strategist, researcher, and consultant with nearly three decades of experience in all facets of database systems development.
This entry was posted in DBA, performance, SQL. Bookmark the permalink.

3 Responses to A Dozen SQL Rules of Thumb, Part 2

  1. Dave Nance says:

    Here is another option for your example in the LIKE rule. Since you are looking for the last character to be an ‘X’, you could perform a SUBSTR function on that last character and create an index on expression of that substr, then you get the matching index access that you are really looking for.

  2. Pingback: A Database Performance Tuning Primer | Data and Technology Today

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s