A Brief Introduction to Data Normalization

Normalization is a series of steps followed to obtain a database design that allows for efficient access and storage of data in a relational database. These steps reduce data redundancy and the chances of data becoming inconsistent. A table is said to be normalized if it satisfies certain constraints. Codd’s original work defined three such forms but there are now five generally accepted steps of normalization. The output of the first step is called First Normal Form (1NF), the output of the second step is Second Normal Form (2NF), etc.

A row is in first normal form if and only if all underlying domains contain atomic values only. 1NF eliminates repeating groups by putting each into a separate table and connecting them with a one-to-many relationship. A row is in second normal form if and only if it is in first normal form and every non-key attribute is fully dependent on the key. 2NF eliminates functional dependencies on a partial key by putting the fields in a separate table from those that are dependent on the whole key. A row is in third normal form if and only if it is in second normal form and every non-key attribute is non-transitively dependent on the primary key. 3NF eliminates functional dependencies on non-key fields by putting them in a separate table. At this stage, all non-key fields are dependent on the key, the whole key and nothing but the key.

But normalization does not stop with 3NF. Additional normal forms have been identified and documented. However, normalization past 3NF does not occur often in normal practice because most tables in 3NF are usually also in 5NF. The additional normal forms are:

  • Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. Indeed, in his later writings Codd refers to BCNF as 3NF. A row is in Boyce Codd normal form if and only if every determinant is a candidate key. Most entities in 3NF are already in BCNF.
  • An entity is in Fourth Normal Form (4NF) if and only if it is in 3NF and has no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independent of each other.
  • Fifth Normal Form (5NF) specifies that every join dependency for the entity must be a consequence of its candidate keys.

For more information on normalization consult the following links:

A comparison of BCNF and 3NF is given here:
http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter7/node12.html

A complete definition of 4NF is given here:
http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter7/node16.html

Advertisements

About craig@craigsmullins.com

I'm a strategist, researcher, and consultant with nearly three decades of experience in all facets of database systems development.
This entry was posted in data modeling, database design, normalization. Bookmark the permalink.

5 Responses to A Brief Introduction to Data Normalization

  1. Pingback: DB2 Hub | A Brief Introduction to Data Normalization

  2. Pingback: Can Never Have To Many Login/Reg Forms Handy ~ Erika Harvey

  3. Pingback: Can Never Have To Many HTML Forms Handy ~ Erika Harvey

  4. Rolando Heitman says:

    Muchos Gracias for your blog post. Really thank you! Really Great.

  5. Jeffery Marchiony says:

    I’m not that much of a online reader to be honest but your sites really nice, keep it up! I’ll go ahead and bookmark your site to come back later. Many thanks|

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s