Too Many Columns!
Today’s entry, the second in a continuing series on bad database standards, attacks the misguided notion of putting too many columns in the SELECT-list of your SQL statements.
Database application performance can be impacted by many factors – but the number one cause of poor relational performance is usually poorly coded SQL. Sometimes, the mistakes are simple to correct – and that is the case with this week’s “bad standard.”
One of the simplest mistakes made by many SQL programmers is including too many columns in the SELECT-list of their SQL statements. The only columns that should be included in the SELECT-list are those that are absolutely needed to meet the business requirements of the query.
Sometimest this notion of SELECT-ing only what you need is simplified as “avoid SELECT *”. That is actually not a bad standard (excepting, of course, quick & dirty ad hoc queries). SELECT * is not a good idea for your application programs because a column can be added to the table causing your program to stop working.
But “avoid SELECT *” does not go far enough because it does not capture the true requirement, which is this: reference only exactly what is needed for the business requirement and nothing more.
The bad standard may read something like this: Every column referenced in a WHERE clause of your SQL statement(s) should also be included in the SELECT-list of that statement.
That is an extremely nasty standard. Some shops do it to support their development tools or to standardize (for some strange reason). But all that it does is build performance degradation into your applications.
Consider the following statement:
SELECT FIRSTNAME, LASTNAME, EMPNO
WHERE EMPNO = ‘700’;
You might be asking something like “What is so wrong with that statement?” Well, there is no reason for EMPNO to be in the SELECT-list. We know its value will always be 700 because of the WHERE clause.
OK, but that is a small issue, right? Maybe. What if this statement runs hundreds, or even thousands of times a day? Every column that the DBMS must pick up and return to the application requires additional resources – a small amount of additional resources, to be sure, but additional resources none-the-less.
If we remove the column from the SELECT-list we remove the requirement to use those resources. Now multiply that by the thousands of times the statement runs and we’re saving something!
The bottom line is that a standard forcing the column into the SELECT-list has no viable reason to exist — and at worst, it can cause a performance problem. So, if something like this is in your company’s standards manual, snip it right out today.