What’s In A Name?
Today’s installment in the continuing saga of bad database standards revolves around naming conventions. I’d like to thank James Koopmann for giving me the idea to blog about this topic. James writes an interesting blog of his own called An Expert’s Guide to Database Solutions that is well worth following.
The establishment and enforcing of naming conventions is often one of the first duties to be tackled when implementing new software. Adequate thought and preparation is required in order for such a task to be successful. What amount of effort should be extended in the creation of appropriate database naming standards? Are current industry standards acceptable? Shakespeare, many, many years ago, may have said it best when he wrote:
“What’s in a name? That which we call a rose
by any other name would smell as sweet.”
But, if that is true, then why do those of us in IT spend so much time and effort developing and enforcing naming standards? Maybe what something is called is not quite so trivial a matter after all!
Well, we know what Shakespeare was trying to say: the name by which we call something has no affect upon the actual object. Calling a desk a chair will not turn it into a chair. It is still a desk. Sometimes IT professionals, and database developers in particular, forget this.
Let’s start with tables. A table consists of multiple rows, each with a fixed and unchanging number of defined columns. However, most RDBMSes provide alternate means of accessing data from these tables. For example, there are temporary tables, materialized query tables, aliases, synonyms, and views.
Each of these alternate means of access are similar in one way: they all present data by means of values in rows and columns. An end user need not know whether he is querying a table, an alias, a synonym, or a view. To the user the results are the same – data represented by values in rows and columns.
Therefore, we should use the exact same naming convention for tables, views, aliases, synonyms, and all table-like objects. Yet it is common for many shops to implement different naming conventions for each of these objects. But why?
Consider your current table naming conventions. If your shop is typical, you will be assigning tables English-like names, perhaps with an application identifier. If they are different, why? This is almost an industry standard for table naming.
Some organizations enforce table names to start with a T (or have a strategically embedded T within the table name). This is a bad standards. The name of each table should accurately and succinctly convey the contents of the data it contains. For example CUSTOMER. Why would you want to name a table that contains customer data something like TCUST?
The general rule should be to avoid embedding a “T”, or any other character, into table names to indicate that the object is a table. Likewise, indicator characters should be avoided for any other table-like object (alias, synonym, view).
Although most shops avoid embedding a “T” in table name, many of these same shops do embed a character into view names. The primary reason given is that the character makes it easy to determine what type of object is being accessed just by looking at the name. There are two reasons why this is a bad idea. The first is a semantic reason; the second a flexibility issue.
In semantic terms, an object’s name need only identify the object, not the object’s type. Think about this in terms of other naming situations. For example, how are people named? Usually one can ascertain the gender of someone simply by knowing their name but would you banish all males named Chris, Pat, or Terry? Or maybe all females named Joe or Nicky? After all, men and women are different. Shouldn’t we make sure that all men’s names are differentiated from women’s names? Maybe we should start all men’s names with an M and all women’s names with a W? If we did, we’d sure have a lot of Marks and Wendys, wouldn’t we? The point here is that context enables us to differentiate men from women, when it is necessary. The same can be said of database objects.
How are program variables named? Do you name your 01, 05, and 77 level variable names differently in your COBOL programs? For example, do all 01 levels start with “O” (for one), all 05 levels start with “F”, and all 77 levels start with “S”? No? Why not? Isn’t this the same as forcing views to start with V (or having a strategically embedded V within the name)?
What about the naming of pets? Say I have a dog, a cat, and a bird. Now, I wouldn’t want to get them confused, so I’ll make sure that I start all of my dog names with a D, cat names with a C, and bird names with a B. So, I’ll feed C_FELIX in the morning, take D_ROVER for a walk after work, and make sure I cover B_TWEETY’s cage before I go to bed. Sounds ridiculous, doesn’t it?
If we don’t manufacture hokey names in the real world, why would we want to do it with our database objects? There is no reason to embed special characters into database object names to differentiate them from one another. And, after all, the system catalog or dictionary can be used to find query all of the metadata about system objects whenever we want, so there is no reason to store metadata (a T or a V) in our database object names. For example, in DB2, this query will list all table-like objects:
SELECT NAME, CREATOR, ‘TABLE’
WHERE TYPE = ‘T’
SELECT NAME, CREATOR, ‘ALIAS’
WHERE TYPE = ‘A’
SELECT NAME, CREATOR, ‘SYNONYM’
SELECT NAME, CREATOR, ‘VIEW’
ORDER BY 3, 1;
The second reason for this rule is to increase flexibility. Say, for example, that we have a table that for some reason is significantly altered, dropped, or renamed. If views are not constrained by rigid naming conventions requiring an embedded “V” in the name, then a view can be constructed that resembles the way the table used to look. Furthermore, this view can be given the same name as the old table. This increases system flexibility.
Most users don’t care whether they are using a table, view, synonym, or alias. They simply want the data. And, in a relational database, tables, views, synonyms, and aliases all logically appear to be identical to the end user: as rows and columns. It is true that there are certain operations that can not be performed on certain types of views, and users who need to know this will generally be sophisticated users. Updates, deletions, and insertions (the operations which are not available to some views) are generally coded into application programs and executed in batch or via on-line transactions.
Now you tell me, which name will your typical end user remember more readily when he needs to access his marketing contacts: MKT_CONTACT or VMKTCT01?
Embedded Meaning (Metadata)
One final troublesome naming convention is embedding specialized meaning into database object names. The name of an object should not attempt to define the object, particularly not its metadata.
So, in general, database object naming conventions should not enforce, or allow for embedding specialized meaning into database object names.
Let’s examine this revised rule by means of an example. Some shops enforce index naming conventions such that the type of index is embedded in the index name. Consider the following bad index naming standard:
Index Name Prototype: cr.aaaXtttl
Each index name is composed of the following segments:
cr. Creator Identification; creator or owner.
aaa Application System; 3 characters.
X Index Indicator; constant X.
ttt Table Name Identification; 3 characters.
abbreviation of corresponding table name.
l Index Letter; 1 character.
P for index corresponding to Primary Key
F for index corresponding to a Foreign Key
U for unique index
N for non-unique index
Note two potential problem areas with this standard:
- An embedded X identifies this object as an index.
- Embedded meaning in the form of indicators detailing the type of index.
The embedded indicator character “X”, although unnecessary, is not as evil as indicator characters embedded in table-like objects. Indexes are not explicitly accessed by users. Therefore, obscure or difficult to remember naming conventions are not as big a problem IMHO. If you must use indicator characters in database names, use them only in objects which are never explicitly accessed by end users.
The second potential problem area poses quite a bit of trouble. Consider the following cases which would cause the embedded meaning in the index name to be incorrect:
- The primary key is dropped.
- A foreign key is dropped.
- The index is altered from non-unique to unique (or vice versa) using a database alteration tool.
- What if an index is defined for a foreign key, but is also unique? Should we use an “F” or a “U”? Or do we need another character?
I have also seen naming conventions which indicate whether the index is clustering (“C”) or not (“N”). This is not a good idea either. Misconceptions can occur. For example, in DB2, if no clustering index is explicitly defined, DB2 will use the first index created as a clustering index. Should this index be named with an embedded “C” or not?
Let’s look at one final example from the real world to better illustrate why it is a bad idea to embed specialized meaning into names. Consider what would happen if we named corporations based upon what they produce. When IBM began, they produced typewriters. If we named corporation like we name database objects, the company could have been named based upon the fact that they manufactured typewriters when they began. IBM might have been called TIBM (the “T” is for typewriters). And guess what, they don’t make typewriters any longer. What would we do? Rename TIBM or live with a name that is no longer relevant?
Naming conventions can evoke a lot of heated discussion. Everybody has their opinion as to what is the best method for naming database objects. Remember, though, that it is best to keep an open mind. And to avoid metadata in database object names!