On Design, the Internet, and Taming the Hostile Database

When DBAs are involved in designing databases that need to support Internet-enabled applications it seems that common sense and data management knowledge frequently fly out the window. Why is this? In today’s post we will investigate the impact of e-business on the design process and discuss the basics of assuring proper database design.

Living at Web Speed

On of the biggest problems that a DBA will encounter when moving from traditional development to e-business development is coping with the mad rush to “get it done NOW!” Industry pundits have coined the phrase “Internet time” to describe this phenomenon. Basically, when a business starts operating on “Internet time” things move faster. One “web month” is said to be equivalent to about three standard months. The nugget of truth in this load of nonsense is that web projects move very fast for a number of reasons:

  • Because business executives want to conduct more and more business over the web to save costs and to connect better with their clients.
  • Because someone read an article in an airline magazine saying that web projects should move fast.
  • Because everyone else is moving fast so you better move fast too or risk losing business.

Well, two of these three reasons are quite valid. I’m sure you may have heard other reasons for rapid application development (RAD). And sometimes RAD is required for certain projects. But RAD can be bad for database design. Why? Well, simply stated, applications are temporary but the data is forever. Organizations are forever coding and re-coding their applications – sometimes the next incarnation of an application is being developed before the last one even has been moved to production.

But when did you ever throw away data? Oh, sure, you may redesign a database or move from one DBMS to another. But what did you do? Chances are you saved the data and migrated it from the old database to the new one. Some changes had to be made, maybe some external data was purchased to combine with the existing data, and most likely some parts of the database were not completely populated. But data lives forever.

To better enable you to glean value from your data it is wise to take care when designing the database. A well-designed database is easy to navigate and therefore, it is easier to retrieve meaningful data from the database.

Database Design Steps

The DBA should create databases by transforming logical data models into physical implementation. It is not wise to dive directly into a physical design without first conducting an in-depth examination of the data needs of the business.

Data modeling is the process of analyzing the things of interest to your organization and how these things are related to each other. The data modeling process results in the discovery and documentation of the data resources of your business. Data modeling asks the question “What?” instead of the more common data processing question, “How?”

Before implementing databases of any sort, a sound model of the data to be stored in the database should be developed. Novice database developers frequently begin with the quick-and-dirty approach to database implementation. They approach database design from a programming perspective. That is, novices do not have experience with databases and data requirements gathering, so they attempt to design databases like the flat files they are accustomed to using. This is a mistake because problems inevitably occur after the databases and applications become operational in a production environment. At a minimum, performance will suffer and data may not be as readily available as required. At worst, data integrity problems may arise rendering the entire application unusable.

A proper database design can not be thrown together quickly by novices. What is required is a practiced and formal approach to gathering data requirements and modeling data. This modeling effort requires a formal approach to the discovery and identification of entities and data elements. Data normalization is a big part of data modeling and database design. A normalized data model reduces data redundancy and inconsistencies by ensuring that the data elements are designed appropriately.

It is actually quite simple to learn the basics of data modeling, but it can take a lifetime to master all of its nuances.

Once the logical data model has been created the DBA uses his knowledge of the DBMS to be used to transform logical entities and data elements into physical database tables and columns. To successfully create a physical database design you will need to have a good working knowledge of the features of the DBMS including:

  • In-depth knowledge of the database objects supported by the DBMS and the physical structures and files required to support those objects.
  • Details regarding the manner in which the DBMS supports indexing, referential integrity, constraints, data types, and other features that augment the functionality of database objects.
  • Detailed knowledge of new and obsolete features for particular versions or releases of the DBMS to be used.
  • Knowledge of the DBMS configuration parameters that are in place.
  • Data definition language (DDL) skills to translate the physical design into actual database objects.

Armed with the correct information, the DBA can create an effective and efficient database from a logical data model. The first step in transforming a logical data model into a physical model is to perform a simple translation from logical terms to physical objects. Of course, this simple transformation will not result in a complete and correct physical database design – it is simply the first step. The transformation consists of the following:

  • Identify and create the physical data structures to be used by the database objects (for example, table spaces, segments, partitions, and files).
  • Transform logical entities in the data model to physical tables
  • Transform logical attributes in the data model to physical columns
  • Transform domains in the data model to physical data types and constraints
  • Choose a primary key for each table from the list of logical candidate keys
  • Examine column ordering to take advantage of the processing characteristics of the DBMS
  • Build referential constraints for relationships in the data model
  • Re-examine the physical design for performance

Of course, the above discussion is a very quick introduction to and summary of data modeling and database design. Every DBA should understand these topics and make sure that all projects, even e-business projects operating on “Internet time” follow the tried and true steps to database design.

What About Big Data?

The current industry meme is all about big data. And in many cases big data gets tied to web applications. But I assure you, the two are not synonymous. You can have a web application that does not need a Big Data approach… and you can have a Big Data application that is not web-enabled.

You should not simply say something like “Well, this is a web application and it should therefore be designed for Hadoop or a column store database. And no design is required, just plop the data out there.

Big Data implementations benefit from properly designed and documented data requirements just like any other application/implementation.

Database Design Traps

Okay, so what if you do not practice data modeling and database design? Or what if you’d like to but are forced to operate on “Internet time” for certain databases?

Well, the answer, of course, is “it depends!” The best advice I can give you is to be aware of design failures that can result in a hostile database. A hostile database is difficult to understand, hard to query, and takes an enormous amount of effort to change.

Of course, it is impossible to list every type of database design flaw that could be introduced to create a hostile database. But let’s examine some common database design failures.

Assigning inappropriate table and column names is a common design error made by novices. Database names that are used to store data should be as descriptive as possible to allow the tables and columns to self-document themselves, at least to some extent. Application programmers are notorious for creating database naming problems, such as using screen variable names for columns or coded jumbles of letters and numbers for table names.

When rushed for time some DBAs resort to designing the database with output in mind. This can lead to flaws such as storing numbers in character columns because leading zeroes need to be displayed on reports. This is usually a bad idea with a relational database. It is better to let the database system perform the edit-checking to ensure that only numbers are stored in the column. If the column is created as a character column then the developer will need to program edit-checks to validate that only numeric data is stored in the column. It is better in terms of integrity and efficiency to store the data based on its domain. Users and programmers can format the data for display instead of forcing the data into display mode for storage in the database.

Another common database design problem is overstuffing columns. This actually is a normalization issue. Sometimes a single column is used for convenience to store what should be two or three columns. Such design flaws are introduced when the DBA does not analyze the data for patterns and relationships. An example of overstuffing would be storing a person’s name in a single column instead of capturing first name, middle initial, and last name as individual columns.

Poorly designed keys can wreck the usability of a database. A primary key should be non-volatile because changing the value of the primary key can be very expensive. When you change a primary key value you have to ripple through foreign keys to cascade the changes into the child table. A common design flaw is using Social Security number for the primary key of a personnel or customer table. This is a flaw for several reasons, two of which are:

  1. Social Security numbers are not necessarily unique
  2. If your business expands outside the USA no one will have a social security number to use, so then what do you store as the primary key?

Actually, failing to account for International issues can have greater repercussions. For example, when storing addresses how do you define zip code. Zip code is USA code but many countries have similar codes; though they are not necessarily numeric. And state is a USA concept, too. Of course, some other countries have states or similar concepts (Canadian provinces). So just how do you create all of the address columns to assure that you capture all of the information for every person to be stored in the table regardless of country? The answer, of course, is to conduct proper data modeling and database design.

Denormalization of the physical database is a design option but it can only be done if the design was first normalized. How do you denormalize something that was not first normalized? Actually, a more fundamental problem with database design is improper normalization. By focusing on normalization, data modeling and database design, you can avoid creating a hostile database.

Without proper up-front analysis and design the database is unlikely to be flexible enough to easily support the changing requirements of the user. With sufficient preparation flexibility can be designed into the database to support the user’s anticipated changes. Of course, if time is not taken during the design phase to ask the users about their anticipated future needs, you cannot create the database with those needs in mind.

Taming the Hostile Database

If data is the heart of today’s modern e-business, then the database design is the armor that protects that heart. Data modeling and database design is the most important part of a database application. If proper design is not a component of the database creation process, you will wind up with a confusing mess of a database that may work fine for the first application, but not for subsequent applications. And heaven help the developer or DBA who has to make changes to the database or application because of changing business requirements. That DBA will have to try to tame the hostile database!


About craig@craigsmullins.com

I'm a strategist, researcher, and consultant with nearly three decades of experience in all facets of database systems development.
This entry was posted in data, data modeling, DBA, enterprise computing. Bookmark the permalink.

2 Responses to On Design, the Internet, and Taming the Hostile Database

  1. Todd Everett says:

    Great article Craig – a really excellent summary of the battle we logical data modelers face every day on application development projects. I’m going to use this article in working with our organization over the coming weeks on an analysis of what is going well and what could be improved in our development of technology solutions for our business. Thanks for continuing to post your analysis of the DBMS world today. I’ve followed you since the late 1990’s back when I considered my DB2 Developer’s Guide my bible!

  2. farmgate says:

    wish i could write like this thanks for the post

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s