Limiting The Number of Tables in “Online” Joins
Today’s entry continues our on-going series on “bad” database standards. And the “bad” standard we’ll discuss tonight is a particularly nasty one. Although the general idea behind this standard might seem to be understandable — at least at first blush — the implementation and wording is typically wrong, wrong, wrong.
OK, so what am I talking about? Well, the way this “bad” standard typically is stated is something like this: “Code no more than a 3 table join in any program that is accessed in an online environment.”
The idea is to limit the amount of work that is done by a transaction so that it can be accomplished in a limited amount of time. Given that there is a user waiting for a response you do not want the transaction to take an inordinate amount of time. Sounds reasonable, right? So why is this wrong?
The “general idea” behind the standard is not necessarily wrong, but the standard is. Whether it says a 3 table join or a 5 table join (or some other number) is not important. The number of tables participating in a join is not the defining factor for performance. Instead, it is the number of qualifying rows to be returned. A well defined query that specifically limits the data to be returned appropriately, can be very efficient whether it is a single table access or a multiple-table join. So do not place artificial limits on the number of tables that can participate in a join. Each DBMS has its upper limit and, of course, you cannot exceed that (for example, 225 tables per SQL statement when using DB2 for z/OS).
Another misunderstanding is when standards are written that “worry” about the size of tables. Once again, this factor is not necessarily important — instead it is the number of qualifying rows again. Consider, for example, a table with 10 million rows in it. A query that uses a matching index to return 2 rows from that table can be very efficient. On the other hand, a query against a smaller table, say 100 pages, that must scan data will likely consume more resources and take longer to complete than the indexed access against the bigger table. The number of rows that qualify for the predicates is more important than the size of the table.
You should create standards and guidelines that help your application developers and SQL users to create efficient queries instead of creating arbitrary standards based on table size or number of tables in a join.