Ensuring data quality is a big part of regulatory compliance. Its importance cannot be understated. Poor data quality costs the typical company at least 10 percent of revenue; 20 percent is probably a better estimate. According to software marketing and technology expert Hollis Tibbetts, “Incorrect, inconsistent, fraudulent and redundant data cost the U.S. economy over $3 Trillion a year.”
The cost of poor data quality notwithstanding, high-quality data is crucial for complying with regulations. Think about it. If the data is not accurate, how can you be sure that the proper controls are being applied to the right pieces of data to comply with the appropriate regulations?
Good data quality starts with metadata. Accurate data definitions are required in order to apply the controls for compliance to the correct data. But what is metadata?
Metadata characterizes data, providing documentation such that data can be understood and more readily consumed by an organization. Metadata answers the who, what, when, where, why, and how questions for users of the data.
Metadata is required to place the data into proper categories for determining which regulations apply. For example, SOX applies to financial data, HIPAA applies to health care data, and so on. Some data will apply to multiple regulations and some data will not be regulated at all. Without proper metadata definitions, it is impossible to apply regulatory compliance to data.
The next step is to ensure that the data, once accurately defined, is itself accurate. Imposing regulatory controls on the wrong data does no good at all. This raises the question “How good is your data quality?” Estimates show that, on average, data quality is an overarching industry problem. According to data quality expert Thomas C. Redman, payroll record changes have a 1 percent error rate; billing records have a 2 to 7 percent error rate; and the error rate for credit records is as high as 30 percent.
But what can a DBA do about poor-quality data? Data quality is a business responsibility, but the DBA can help by instating technology controls. Building constraints into the database can improve overall data quality, as well as defining referential integrity in the database. Additional constraints should be defined in the database as appropriate to control uniqueness, as well as data value ranges using CHECK constraints and triggers.
Another technology tactic that can be deployed to improve data quality is data profiling. Data profiling is the process of examining the existing data in the database and collecting statistics and other information about that data. With data profiling, you can discover the quality, characteristics, and potential problems of information. Using the statistics collected by the data profiling solution, business analysts can undertake projects to clean up problematic data in the database.
Data profiling can discover the quality, characteristics, and potential problems of information.
Data profiling can dramatically reduce the time and resources required to find problematic data. Furthermore, it allows business analysts and data stewards to have more control of the maintenance and management of enterprise data.
Data governance programs are becoming more popular as corporations work to comply with more and stricter governmental regulations. A data governance program oversees the management of the availability, usability, integrity, and security of enterprise data. A sound data governance program includes a governing body or council, a defined set of procedures, and a plan to execute those procedures.
So an organization with a strong data governance practice will have better control over its information. When data management is instituted as an officially sanctioned mandate of an organization, data is treated as an asset. That means data elements are defined in business terms; data stewards are assigned; data is modeled and analyzed; metadata is defined, captured, and managed; and data is archived for long-term data retention.
All of this should be good news to data professionals who have wanted to better define and use data within their organizations. The laws are finally catching up with what we knew our companies should have been doing all along.
 Thomas C. Redman, “Data: An Unfolding Quality Disaster”, DMReview, August 2004.
 Thomas C. Redman, Data Quality: Management and Technology (New York, NY: Bantam, 1992).