Today’s blog post picks up where we left off in our three-part series of rules of thumb (ROTs) that apply generally to SQL development regardless of the underlying DBMS.
These are the general guiding principles by which your SQL development should be guided… and we start off today’s post with rule #5…
Rule 5: Avoid Cartesian Products
Be sure to code predicates matching the columns of every table that will be joined within each SQL statement. Failure to do so will result in performance degradation and possibly incorrect results.
Whenever predicates do not exist when you are joining two tables, the RDBMS must perform a Cartesian product. This is the combination of every row of one table with every row of the other table. Non-matching rows are not eliminated because there is nothing that can be matched. The results of a Cartesian product are difficult to interpret and contain no information other than a simple list of all rows of each table munged together.
Of course, there are exceptions to this rule where you are specifically coding a Cartesian product for a business or technical reason. When coding a Cartesian product on purpose always make sure that you have investigated other methods of producing the same results and tested each method for performance and accuracy… and be sure to specifically test any query with a Cartesian product using production volumes of data!
Rule 6: Judicious Use of OR
The OR logical operator can be troublesome for performance. If you can convert a SQL statement that uses OR to one that uses IN, it is likely that performance will improve. For example, consider changing this SQL statement:
SELECT e.position, e.last_name, e.empno, d.manager FROM employee e, department d WHERE d.dept_id = e.dept_id AND position = ‘MANAGER’ OR position = ‘DIRECTOR’ OR position = ‘VICE PRESIDENT’ ORDER BY position;
SELECT e.position, e.last_name, e.empno, d.manager FROM employee e, department d WHERE d.dept_id = e.dept_id AND position IN (‘MANAGER’, ‘DIRECTOR’, ‘VICE PRESIDENT’) ORDER BY position;
Of course, your results may vary depending on the DBMS in use and the nature of the data.
Rule 7: Judicious Use of LIKE
The LIKE logical operator is another troublesome beast. It is very easy to create performance problems when using LIKE in SQL. For example, consider the following SQL statement:
SELECT position, last_name, empno FROM employee WHERE dept_id LIKE ‘%X’ ORDER BY position;
This query will return employee information for all employees working in any department where dept_id ends in ‘X’. However, the relational optimizer will have to scan the data in order to resolve this query—there is no way to use an index. Because the high-order portion of the column is not known, traversing a b-tree index structure is impossible.
You might be able to use your knowledge of the data to rewrite this query without a leading wild-card character (%). For example, perhaps all dept_id values start with either ‘A’ or ‘B’. In that case, you could modify the SQL as follows:
SELECT position, last_name, empno FROM employee WHERE dept_id LIKE ‘A%X’ OR dept_id LIKE ‘B%X’ ORDER BY position;
In this case, the DBMS may be able to use a non-matching index scan if an index exists on the dept_id column. Scanning the index may be more efficient than scanning the entire table, but even then you will be scanning data, which is not usually very efficient.
Once again, your results will vary with the DBMS in use and the nature of the data accessed.
Rule 8: Avoid Sorts When Possible
Sorting data is an inhibitor of optimal performance in SQL queries. Your DBMS will sort data as needed to satisfy your database requests. The types of operations that usually require sorting of some form are ORDER BY, GROUP BY, DISTINCT, UNION, INTERSECT, and EXCEPT. When performance is important, remember to look for sorts and find ways to eliminate them. You might be able to create an index to avoid sorting, or to use an alternate syntax if duplicate elimination is not important (e.g. UNION ALL versus UNION).
Keep in mind that sorting is an I/O intensive operation and can degrade query performance, sometimes significantly. When performance is important, remember to look for sorts and find ways to eliminate them.
And so we come to the end of part 2 in our 3 part series offering up 12 SQL rules of thumb… tune in next time for the final 4 rules (numbers 9 thru 12) in our dozen guidelines for developing effective and efficient SQL!