Bad Database Standards, Part 6

Posted on May 20, 2010 by craig@craigsmullins.com

What’s In A Name?

Today’s installment in the continuing saga of bad database standards revolves around naming conventions. I’d like to thank James Koopmann for giving me the idea to blog about this topic. James writes an interesting blog of his own called An Expert’s Guide to Database Solutions that is well worth following.

The establishment and enforcing of naming conventions is often one of the first duties to be tackled when implementing new software. Adequate thought and preparation is required in order for such a task to be successful. What amount of effort should be extended in the creation of appropriate database naming standards? Are current industry standards acceptable? Shakespeare, many, many years ago, may have said it best when he wrote:

“What’s in a name? That which we call a rose
by any other name would smell as sweet.”

But, if that is true, then why do those of us in IT spend so much time and effort developing and enforcing naming standards? Maybe what something is called is not quite so trivial a matter after all!

Well, we know what Shakespeare was trying to say: the name by which we call something has no affect upon the actual object. Calling a desk a chair will not turn it into a chair. It is still a desk. Sometimes IT professionals, and database developers in particular, forget this.

Let’s start with tables. A table consists of multiple rows, each with a fixed and unchanging number of defined columns. However, most RDBMSes provide alternate means of accessing data from these tables. For example, there are temporary tables, materialized query tables, aliases, synonyms, and views.

Each of these alternate means of access are similar in one way: they all present data by means of values in rows and columns. An end user need not know whether he is querying a table, an alias, a synonym, or a view. To the user the results are the same – data represented by values in rows and columns.

Therefore, we should use the exact same naming convention for tables, views, aliases, synonyms, and all table-like objects. Yet it is common for many shops to implement different naming conventions for each of these objects. But why?

Consider your current table naming conventions. If your shop is typical, you will be assigning tables English-like names, perhaps with an application identifier. If they are different, why? This is almost an industry standard for table naming.

Some organizations enforce table names to start with a T (or have a strategically embedded T within the table name). This is a bad standards. The name of each table should accurately and succinctly convey the contents of the data it contains. For example CUSTOMER. Why would you want to name a table that contains customer data something like TCUST?

The general rule should be to avoid embedding a “T”, or any other character, into table names to indicate that the object is a table. Likewise, indicator characters should be avoided for any other table-like object (alias, synonym, view).

Although most shops avoid embedding a “T” in table name, many of these same shops do embed a character into view names. The primary reason given is that the character makes it easy to determine what type of object is being accessed just by looking at the name. There are two reasons why this is a bad idea. The first is a semantic reason; the second a flexibility issue.

In semantic terms, an object’s name need only identify the object, not the object’s type. Think about this in terms of other naming situations. For example, how are people named? Usually one can ascertain the gender of someone simply by knowing their name but would you banish all males named Chris, Pat, or Terry? Or maybe all females named Joe or Nicky? After all, men and women are different. Shouldn’t we make sure that all men’s names are differentiated from women’s names? Maybe we should start all men’s names with an M and all women’s names with a W? If we did, we’d sure have a lot of Marks and Wendys, wouldn’t we? The point here is that context enables us to differentiate men from women, when it is necessary. The same can be said of database objects.

How are program variables named? Do you name your 01, 05, and 77 level variable names differently in your COBOL programs? For example, do all 01 levels start with “O” (for one), all 05 levels start with “F”, and all 77 levels start with “S”? No? Why not? Isn’t this the same as forcing views to start with V (or having a strategically embedded V within the name)?

What about the naming of pets? Say I have a dog, a cat, and a bird. Now, I wouldn’t want to get them confused, so I’ll make sure that I start all of my dog names with a D, cat names with a C, and bird names with a B. So, I’ll feed C_FELIX in the morning, take D_ROVER for a walk after work, and make sure I cover B_TWEETY’s cage before I go to bed. Sounds ridiculous, doesn’t it?

If we don’t manufacture hokey names in the real world, why would we want to do it with our database objects? There is no reason to embed special characters into database object names to differentiate them from one another. And, after all, the system catalog or dictionary can be used to find query all of the metadata about system objects whenever we want, so there is no reason to store metadata (a T or a V) in our database object names. For example, in DB2, this query will list all table-like objects:

SELECT NAME, CREATOR, ‘TABLE’
FROM SYSIBM.SYSTABLES
WHERE TYPE = ‘T’
UNION ALL
SELECT NAME, CREATOR, ‘ALIAS’
FROM SYSIBM.SYSTABLES
WHERE TYPE = ‘A’
UNION ALL
SELECT NAME, CREATOR, ‘SYNONYM’
FROM SYSIBM.SYSSYNONYMS
UNION ALL
SELECT NAME, CREATOR, ‘VIEW’
FROM SYSIBM.SYSVTREE
ORDER BY 3, 1;

The second reason for this rule is to increase flexibility. Say, for example, that we have a table that for some reason is significantly altered, dropped, or renamed. If views are not constrained by rigid naming conventions requiring an embedded “V” in the name, then a view can be constructed that resembles the way the table used to look. Furthermore, this view can be given the same name as the old table. This increases system flexibility.

Most users don’t care whether they are using a table, view, synonym, or alias. They simply want the data. And, in a relational database, tables, views, synonyms, and aliases all logically appear to be identical to the end user: as rows and columns. It is true that there are certain operations that can not be performed on certain types of views, and users who need to know this will generally be sophisticated users. Updates, deletions, and insertions (the operations which are not available to some views) are generally coded into application programs and executed in batch or via on-line transactions.

Now you tell me, which name will your typical end user remember more readily when he needs to access his marketing contacts: MKT_CONTACT or VMKTCT01?

Embedded Meaning (Metadata)

One final troublesome naming convention is embedding specialized meaning into database object names. The name of an object should not attempt to define the object, particularly not its metadata.

So, in general, database object naming conventions should not enforce, or allow for embedding specialized meaning into database object names.

Let’s examine this revised rule by means of an example. Some shops enforce index naming conventions such that the type of index is embedded in the index name. Consider the following bad index naming standard:

————————————————————————————-

Index Name Prototype: cr.aaaXtttl
Example: CREATOR.MKTXCONU

Each index name is composed of the following segments:

cr. Creator Identification; creator or owner.

aaa Application System; 3 characters.

X Index Indicator; constant X.

ttt Table Name Identification; 3 characters.
abbreviation of corresponding table name.

l Index Letter; 1 character.

P for index corresponding to Primary Key
F for index corresponding to a Foreign Key
U for unique index
N for non-unique index

————————————————————————————-

Note two potential problem areas with this standard:

An embedded X identifies this object as an index.

Embedded meaning in the form of indicators detailing the type of index.

The embedded indicator character “X”, although unnecessary, is not as evil as indicator characters embedded in table-like objects. Indexes are not explicitly accessed by users. Therefore, obscure or difficult to remember naming conventions are not as big a problem IMHO. If you must use indicator characters in database names, use them only in objects which are never explicitly accessed by end users.

The second potential problem area poses quite a bit of trouble. Consider the following cases which would cause the embedded meaning in the index name to be incorrect:

The primary key is dropped.

A foreign key is dropped.

The index is altered from non-unique to unique (or vice versa) using a database alteration tool.

What if an index is defined for a foreign key, but is also unique? Should we use an “F” or a “U”? Or do we need another character?

I have also seen naming conventions which indicate whether the index is clustering (“C”) or not (“N”). This is not a good idea either. Misconceptions can occur. For example, in DB2, if no clustering index is explicitly defined, DB2 will use the first index created as a clustering index. Should this index be named with an embedded “C” or not?

Let’s look at one final example from the real world to better illustrate why it is a bad idea to embed specialized meaning into names. Consider what would happen if we named corporations based upon what they produce. When IBM began, they produced typewriters. If we named corporation like we name database objects, the company could have been named based upon the fact that they manufactured typewriters when they began. IBM might have been called TIBM (the “T” is for typewriters). And guess what, they don’t make typewriters any longer. What would we do? Rename TIBM or live with a name that is no longer relevant?

Synopsis

Naming conventions can evoke a lot of heated discussion. Everybody has their opinion as to what is the best method for naming database objects. Remember, though, that it is best to keep an open mind. And to avoid metadata in database object names!

About craig@craigsmullins.com

I'm a data management strategist, researcher, and consultant with over three decades of experience in all facets of database systems development and implementation.

View all posts by craig@craigsmullins.com →

This entry was posted in Naming Conventions, standards. Bookmark the permalink.

18 Responses to Bad Database Standards, Part 6

LewisC says:

May 20, 2010 at 10:46 am

Hi.

This is an excellent post and so true. The data dictionary is the place for meta data, not object names. I have to admit that in the past, I was a proponent of appending _VW to view names. The flexibility factor changed my opinion over time.

One of my pet peeves is trigger names. Almost every place I have been puts a BIUR in a BEFORE INSERT or UPDATE RECORD trigger. AIS, BUDR, argh. Stop it. What if you add insert to the update? Or delete? Some rename the triggers; some just let the metadata get stale. Ugly.

Thanks for a great post!

LewisC

Reply
Pingback: Tweets that mention Bad Database Standards, Part 6 | Data and Technology Today -- Topsy.com
Karen Lopez says:

May 20, 2010 at 1:51 pm

Great points, Craig. I also believe that prefixing table names with something other than the nature of the data is cumbersome. I can usually spot a database designed by a junior architect when I see table names like “tbl_CST” or ‘t_PR198_DR”. Those names are indicators to me that I’m going to find real structural and semantic problems with the actual design as well.

Reply
Roland Bouman says:

May 20, 2010 at 2:15 pm

It would be nice to be able to use natural names for tables. Problem is that there are too many reserved words. You could get around that by using quoted identifiers, but this makes SQL queries unnecessarily hard to read and write.

A simple prefix that is application specific makes it very easy for both developers and dbas to have a clear reminder of what database / application they are working on. It may not live up to the ideal of mirroring the real world into the system, but in practice, it makes it a lot easier to work with many different databases.

Also, in my experience, developers and dba’s are stuck much longer with a particular database than the information analyst or data modeler that designed. It stands to reason that the people that have to spend most time with it, have a say in object names.

Reply
- Karen Lopez says:
  
  May 20, 2010 at 8:31 pm
  
  Dems fighting words about DA not liking it :). But we had that battle over on Twitter.
  
  I’m pretty sure we both won.
  
  Reply
  - Roland Bouman says:
    
    May 20, 2010 at 8:35 pm
    
    Hi Karen 🙂
    
    yup – I agree there’s merit in both points of view.
MarcTun says:

May 20, 2010 at 7:19 pm

whatever you do, be consistent.

I find it handy to have tables that belong to the same system to be prefixed by a system identifier (sales_adress, P99Adress).

There might be reasons to limit yourself in table names etc to use, to accommodate others eg. COBOL.

When you’re not bound to english, there are a lot less reserved words 😉

Reply
- datatechnologytoday says:
  
  May 20, 2010 at 7:54 pm
  
  I would caution you about putting system identifiers into table names. There are a couple of reasons why I would advise against it. First, the system identifier is not a defining component of the data that is in the table nor is it easy for users to remember. Yes, it might be something a DBA or systems person would know immediately, but the table is not there to serve the DBA or systems person, it is there to serve the business. As such, a reasonable, readable, business name is desirable. Secondly, data in a database is there to be shared across systems. For example, the PRODUCT table or the CUSTOMER table may service multiple systems. Why should there be a system identifier in such a table name? Again, as much as possible, it gets back to avoiding metadata in the table name.
  
  Of course, these are my opinions and others may have different experiences and opinions. Feel free to share them here as a comment.
  
  Reply
  - Karen Lopez says:
    
    May 20, 2010 at 8:33 pm
    
    Craig, I agree about not putting app prefixes on table names, but I would throw in that more and more end users do not access databases directly; they use services, views, or other data layers to get at their data.
    
    Having said that, I still believe that meaning names, and consistent names are the most important.
  - Roland Bouman says:
    
    May 20, 2010 at 8:53 pm
    
    I agree in principle with this point of view. But while the DBA and systems person are not the primary user of the system, they are the ones that are most likely to be confronted with these internals – because that’s what these identifiers are – internals. Or at least, it’s not the end-users that are scribbling SQL queries to do, say, ad-hoc reporting.
    
    Call me conservative, but I think business *is* well served if the people responsible for keeping the systems up and running feel comfortable and confident with them.
    
    The point about databases being shared across applications is certainly something to consider. But my point is exactly that: even though systems could be consolidated, and data be shared, it is often not the case. So if we are in a situation where multiple systems need to be managed and developed against, I’d rather be able to distinguish them from one another, and be able to tell at a glance what piece of code was meant for which system.
    
    Where I work, if some developer tells me there is something wrong with *the* product table, it makes exactly zero sense. We have multiple ERP systems, a HR systems, a payroll system, a project system, and incident tracking system and this is just not counting the legacy systems. If there is shared data, it is so because of asynchronous exchange solutions put in place to glue these systems together.
    
    You could argue that there shouldn’t be multiple systems in the first place, but mergers, acquisitions, and reorganisations are typically not driven by considerations of maintaining or creating a consistent IT application landscape. So having to deal with heterogenous environments is just a fact of life – we could as well resist against the tide, the seasons or the sunrise.
    
    I am not saying this is the case in every shop, but I’ve seen this more than once.
TechnOntologist {Alan Howlett} says:

May 21, 2010 at 1:23 pm

One observation from both information architecture and DBA perspectives is that when our design and implementation tools were primitive or even non-existent, naming conventions indicating the object type had a functional value, especially in non-stovepipe environments. Now with modern tools and extensible frameworks it has become much easier to pull up only views and identify all dependencies with a couple of mouse clicks. Our repertoire has expanded, providing tool support, freeing us to re-architect those vertical cylinders of excellence that abound.

Reply
- MarcTun says:
  
  May 30, 2010 at 5:25 pm
  
  But there you are. Your legacy systems on one site and your new possibilities on the other. The cost of renaming the old is high (and even if they were low, try convincing your management they have to invest in this).
  So adapting new standards taking advantage of new possibilities only for your new developments? But in a couple of years time there’ll be 4 standards not making it clear to either developer or user when what is (to be) used.
  It is indeed not quite so trivial a matter after all.
  
  Reply
datatechnologytoday says:

May 30, 2010 at 6:31 pm

I’ve always said that the best thing about standards is that there are so many to choose from!

Reply
MarcTun says:

June 25, 2010 at 6:23 pm

Just had too share: IBM was founded in 1896 but was named the Tabulating Machine Company at that time. They re branded to IBM in 1924.

Reply
datatechnologytoday says:

June 25, 2010 at 6:38 pm

Thanks for the clarification Marc.

For anyone interested, Wikipedia has a nice history of IBM.

Reply
William Sanville says:

May 10, 2011 at 7:40 am

What are your thoughts on using abbreviations versus fully spelled out English words? Like for example, a column name of “UPDT_STRTGY” versus “UpdateStrategy.” The first can be very ambiguous, while spelling out the entire, not so long version, can’t be misinterpreted.

Also, what are your thoughts about reiterating the table name as part of column names? For example, if I have a table called CTGRY, and a category has a title, would a better column name be CTGRY_TITLE, or just TITLE, (or preferably, cased as Title).

Reply
- datatechnologytoday says:
  
  May 10, 2011 at 10:43 am
  
  Good questions, William.
  
  Regarding the first question, both abbreviations and fully spelled out words can be useful. All being equal, I prefer full spelled out words. When the columns are frequently being used by end users who code SQL by hand, then perhaps abbreviated columns have a benefit. However, most SQL coders use some form of query or BI tool these days, negating the shorthand benefit of abbreviations. I think my stock advice would be that IF you use abbreviations, be sure to develop a list of standard abbreviations and do not deviate from them. So, in your example: if STRTGY was the abbreviation for STRATEGY, then either use STRATEGY (fully spelled out) or STRTGY (abbrv.) but not some other abbreviated form. Standardization like this is helpful.
  
  Regarding your second question, I am NOT a fan of including the table name in the column name. So, in your example, I would go with TITLE (or Title). Now in some cases you may have that same attribute in the table multiple times, in which case it needs an adjective to describe it. However, using the table name offers no help as both would be prefaced with the table name, right? My general rule of thumb is to describe the attribute as fully, but as succinctly, as possible.
  
  Reply
Pingback: Bad Database Standards | Data and Technology Today