I would guess that most of the readers of this blog understand the basic concepts and fundamentals of database technology. However, many folks who think they understand the basics often do not have the knowledge and understanding they believe they have. Therefore, today’s post serves as a very brief introduction to the fundamentals of database management systems.
What Is a Database?
The answer to this question may surprise some readers. Oracle is not a database; neither are Db2, PostgreSQL, MongoDB, MySQL, or SQL Server. Each of these is a DBMS, or database management system. You can use Oracle or Db2 or SQL Server to create a database, but none of these themselves are databases. Many people, even skilled professionals, confuse the overall system – the DBMS – with the creation of the system – databases.
So, what is a database? A database is a structured set of persistent data. A phonebook is a database. However, within the world of IT, a database usually is associated with software. A simple database might be a single file containing many records, each of which contains the same set of fields where each field is a certain data type and length. In short, a database is an organized store of data where the data is accessible by named data elements.
A DBMS is a software package designed to create, store, and manage databases. The DBMS software enables end users or application programmers to share data. It provides a systematic method of creating, updating, retrieving, and storing information in a database. DBMS products are usually responsible for data integrity, data access control, automated rollback, restart and recovery.
Thinking abstractly, you might think of a database as a file folder, and a DBMS as the file cabinet holding the labeled folders. You implement and access database instances using the capabilities of the DBMS. Your payroll application uses the payroll database, which may be implemented using a DBMS such as Oracle Database 21c, Db2, MongoDB, or SQL Server.
Why is this distinction important? Using precise terms in the workplace avoids confusion. And the less confused we are the more we can avoid problems and issues that lead to over-budget projects, improperly developed systems, and lost productivity. Therefore, precision should be important to all of us.
Why Use a DBMS?
The main advantage of using a DBMS is to impose a logical, structured organization on the data. A DBMS delivers economy of scale for processing large amounts of data because it is optimized for such operations.
Historically, there are four DBMS data models: hierarchical, network, relational. and object-oriented.
A DBMS can be distinguished by the model of data upon which it is based. A data model is a collection of concepts used to describe data. A data model has two fundamental components: its structure, which is the way data is stored, and its operations, which is the way that data can be manipulated. The major DBMS products utilize four different data models:
- Network (or CODASYL)
The network data model is structured as a collection of record types and the relationships between these record types. All relationships are explicitly specified and stored as part of the structure of the DBMS. Another common name for the network model is CODASYL. CODASYL is named after the Conference on Data Systems Languages, the committee that formulated the model in the early 1970s. Data is manipulated using the location of a given record and following links to related records. IDMS is an example of a DBMS based on the network model.
The hierarchical data model arranges data into structural trees that store data at lower levels subordinate to data stored at higher levels. A hierarchical data model is based on the network model with the additional restriction that access into a record can only be accomplished in one way. IMS is an example of a DBMS based on the hierarchical model.
The relational data modelconsists of a collection of tables (more properly, relations) wherein the columns define the relationship between tables. The relational model is based on the mathematics of set theory. Contrary to popular belief, the relational model is not named after “relationships,” but after the relations of set theory. A relation is a set with no duplicate values. Data can be manipulated in many ways, but the most common way is through SQL. DB2, Oracle, and SQL Server are examples of DBMS products based on the relational model.
The object-oriented (OO) data modelconsists of a collection of entities, or objects, where each object includes the actions that can take place on that object. In other words, an object encapsulates data and process. With OO systems, data is typically manipulated using an OO programming language. Progress Software’s ObjectStore and Intersystems’ Cache are examples of DBMS products based on the OO model.
Each of these four data models is referred to as a data model for the sake of simplicity. In reality, only the relational and network models have any true, formal data model specification. Different models of data lead to different logical and structural data organizations. The relational model is the most popular data model because it is the most abstract and easiest to apply to data, while providing powerful data manipulation and access capabilities.
Other Types of DBMS
Although the four data models discussed heretofore are the predominant types of DBMS, there are other types of DBMS with varying degrees of commercial acceptance.
A column-oriented DBMS, sometimes called a column store, is a DBMS that stores its content by column rather than by row. This has advantages for data warehouses where aggregates are computed over large numbers of data items. Of course, a column-oriented DBMS is not based on any formal data model and can be thought of as a special physical implementation of a relational DBMS. Sybase IQ and Greenplum are examples of column stores.
The NoSQL database system is another type of DBMS that has gained traction in the market, usually in Big Data applications. NoSQL DBMSes are characterized by their flexible schema and non-reliance on SQL, however many NoSQL offerings have added support for SQL due to its ubiquity. With Key-Value data stores, a piece of data is associated with a key. The data is not rigidly structured and does not have to conform to a schema such as in a typical database design.
There are four types of NoSQL DBMS products:
A document store manages and stores data at the document level. A document is essentially an object and is commonly stored as XML, JSON, BSON, etc. A document database is ideally suited for high performance, high availability, and easy scalability. You might consider using a document store for web storefront applications, real-time analytical processing, or to front a blog or content management system. They are not very well-suited for complex transaction processing as typified by traditional relational applications, though. MongoDB is the most popular document database, but others include Couchbase, RavenDB and MarkLogic.
The key/value database system is useful when all access to the database is done using a primary key. There typically is no fixed data model or schema. The key is identified with an arbitrary “lump” of data. A key/value pair database is useful for shopping cart data or storing user profiles. It is not useful when there are complex relationships between data elements or when data needs to be queried by other than the primary key. Examples of key/value stores include Riak, Berkeley DB, and Aerospike.
A wide-column store uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. Examples of wide-column stores include Apache Cassandra, Amazon DynamoDB, and DataStax Enterprise.
Finally, we have the graph database, which uses graph structures with nodes, edges, and properties to represent and store data. In a graph database every element contains a direct pointer to its adjacent element and no index lookups are necessary. Social networks, routing and dispatch systems, and location aware systems are the prime use cases for graph databases. Some examples include Neo4j, GraphBase, and Meronymy.
NoSQL database systems are popular with organizations that face different data challenges than can be solved using traditional RDBMS solutions. Cassandra and MongoDB are examples of NoSQL key-value database systems.
There are other DBMS implementations as well, such as the inverted list structure of the original Adabas and even the dBase format popularized by the PC DBMS, dBase II and dBase III.
Most commercial DBMS implementations today are relational.
Advantages of Using a DBMS
Additionally, a DBMS provides a central store of data that can be accessed by multiple users, from multiple locations. Data can be shared among multiple applications, rather than having to be propagated and stored in new files for every new application. Central storage and management of data within the DBMS provides
- Data abstraction and independence.
- Data security.
- A locking mechanism for concurrent access
- An efficient handler to balance the needs of multiple applications using the same data
- The ability to swiftly recover from crashes and errors
- Robust data integrity capabilities
- Simple access using a standard API
- Uniform administration procedures for data
Levels of Data Abstraction
A DBMS can provide many views of a single database schema. A view defines what data the user sees and how that user sees the data. The DBMS provides a level of abstraction between the conceptual schema that defines the logical structure of the database and the physical schema that describes the files, indexes, and other physical mechanisms used by the database. Users function at the conceptual level—by querying columns within rows of tables, for example—instead of having to navigate through the many different types of physical structures that store the data.
A DBMS makes it much easier to modify applications when business requirements change. New categories of data can be added to the database without disruption to the existing system.
A DBMS provides a layer of independence between the data and the applications that use the data. In other words, applications are insulated from how data is structured and stored. The DBMS provides two types of data independence:
- Logical data independence—protection from changes to the logical structure of data
- Physical data independence—protection from changes to the physical structure of data
As long as the program uses the API (application programming interface) to the database as provided by the DBMS, developers can avoid changing programs because of database changes.
Note: The primary API to relational databases is SQL. In general, most application SQL statements need not change when database structures change (e.g., a new column is added to a table).
Data security prevents unauthorized users from viewing or updating the database. The DBMS uses IDs and passwords to control which users are allowed access to which portions of the database. For example, consider an employee database containing all data about individual employees. Using the DBMS security mechanisms, payroll personnel can be authorized to view payroll data, whereas managers could be permitted to view only data related to project history.
A DBMS can serve data to multiple, concurrently executing user programs. This requires a locking mechanism to deliver concurrency control because the actions of different programs running at the same time could conceivably cause data inconsistency. For example, multiple bank ATM users might be able to withdraw $100 each from a checking account containing only $150. A DBMS ensures that such problems are avoided because the locking mechanism isolates transactions competing for the same exact data.
The DBMS uses database logging to record “before” and “after” images of database objects as they are modified. It is important to note that the database log captures information about everydata modification (except in circumstances as determined by the DBA). The information on the database logs can be used to undo and redo transactions. Database logging is handled transparently by the DBMS—that is, it is done automatically.
Ensuring Atomicity and Durability
A DBMS can be used to assure the all-or-nothing quality of transactions. This is referred to as atomicity, and it means that data integrity is maintained even if the system crashes in the middle of a transaction. Furthermore, a DBMS provides recoverability. After a system failure, data can be recovered to a state that existed either immediately before the crash or at some other requisite point in time.
The DBMS provides mechanisms for defining rules that govern the type of data that can be stored in specific fields or columns. Only data that conforms to the business rules will ever be stored in the database. Furthermore, the DBMS can be set up to manage relationships between different types of data and to ensure that changes to related data elements are accurately implemented.
A DBMS provides a standard query language to enable users to interactively interrogate the database and analyze its data. For relational databases, this standard API is SQL, or Structured Query Language. However, SQL is not a requirement for a DBMS to be relational. Furthermore, many DBMS products ship with analytical tools and report writers to further simplify data access.
This section on DBMS fundamentals is necessarily brief because the focus of this book is on database administration and most readers will find this material to be familiar. If you require additional details on the basic operations and qualities of DBMSs and databases, please refer to the Bibliography for an extensive list of DBMS-related books. My favorites include:
- C. J. Date’s An Introduction to Database Systems, 8th ed., for an academic and theoretical approach to the material;
- Joe Celko’s Data & Databases: Concepts in Practice, for a good practical overview of the topic
- The latest edition of Pratt and Adamski’s The Concepts of Database Management, provides a good high-level overview of DBMS concepts
A primary benefit of a DBMS is its ability to maintain and query large amounts of data while assuring data integrity and consistency. It offers transparent recovery from failures, concurrent access, and data independence. In fact, most modern computer applications rely on DBMS and database technology to manage data. Understanding the topic is of benefit to all IT professionals.