A Useful Guide to Data Fundamentals from Fabian Pascal

Today’s blog post is a quick review of Fabian Pascal’s latest book The DBDEBUNK Guide to Misconceptions and Data Fundamentals. Those of you who know Pascal will know what to expect — a no-holds barred treatise on the fundamentals of data and database management focusing on the relational model and its benefits. But also you will get a true understanding of what relational means. You see, what is commonly called a relational DBMS more accurately should be referred to as a SQL DBMS. If that doesn’t make sense to you then you are exactly the type of reader who should buy and read this book.

The book is self-published, so it is only available at Mr. Pascal’s website, dbdebunk.com. But the material is well-written and laid out in an easy-to-consume fashion. This comes as no surprise given Pascal’s extensive history and background writing about relational theory and databases (multiple decades of books, articles, blogs, etc. to his credit).

What makes this book special? Well, the material is culled from the the DATABASE DEBUNKINGS website. There are 50 chapters each exposing a common misconception or misunderstanding about data and relational fundamentals. Pascal shows the misconception and then clearly explains the problems and implications that arise because of it. Furthermore, he goes on to provide an explanation of the correct way. This approach, once you get comfortable with it, offers a sound format for exposed fallacies and correcting people’s misunderstanding of the issues.

As the foreword  notes, there is an Index of Misconceptions rather than a a Table of Contents. You can use that index to seek out sections of the book that focus on specific misconceptions, such as on keys or unstructured data.

Also nice is the list of further reading provided in each section. If you read (and understand) not only this book, but also all of the referenced materials you will probably know more about data and database systems than most people working as data analysts and DBAs.

If your job requires you to manage, access or manipulate data, you would do well to read Pascal’s Guide. Even if you think you are an expert on relational theory and data fundamentals I am sure that the information in this powerful tome will offer up useful information. And for most people, who think they know more than they actually do, this book will deliver a wealth of knowledge that will serve you well as you progress with your IT career.

Posted in book review, DBA, relational | 1 Comment

A Database Performance Tuning Primer

In a recent series of posts to this blog I outlined a dozen SQL rules of thumb to follow as you design and code your database applications. These articles can be found at the following links:

Rules of thumb are great but let’s face it, we don’t always follow them. And then we get performance degradation. That means we have to track down the root cause of the problem, analyze the situation and apply corrective measures. This can be difficult without a performance management tool or two.

Database performance management tools can be used to identify bottlenecks and points of contention. These tools can help to show locking, I/O, CPU and other problems as your applications are running; or historically at a point in time if you have appropriate traces started and/or have implemented a performance data warehouse.

Such tools can be beneficial for determining where resources are being used. For example, do you know which programs consume the most database resources? Which consumes the most CPU, takes the most I/Os, issues the most locks, deletes the most data, etc.? Performance monitors can help you to accurately identify all of these issues. You can monitor system and database workload and throughput and manage the resources being used to satisfy your workload.

Database performance management tools can be broken up into three specific functional groupings based on what is being monitored and managed. Some of these tools offer functionality in only one of the following areas, whereas others deliver combined functionality:

  1. The DBMS. Tools that manage DBMS performance are focused more on database system resources and performance. They provide information about performance at the system level, showing things like CPU and memory consumption, buffer usage, locking issues, storage and other hardware information, configuration parameters, and other details.
  2. Database structures. The design of databases, tables and indexes can also impact database performance. Issues that must be monitored and managed include the physical design of the database, number of tables, index design and DDL, disk usage, and so on. How the data is organized must also be managed. And as data is modified in the database, its efficiency will degrade. Reorganization and defragmentation are required to periodically remedy disorganized data.
  3. SQL and application code. Coding efficient SQL statements can be complicated because there are many different ways to write SQL that return the same results. But the efficiency and performance of each formulation can vary significantly. DBAs need tools that can monitor the SQL code that’s being run, show the access paths it uses and provide guidance on how to improve the code.

I recently wrote a four part series of articles for TechTarget on database performance management offerings that delves into more details on these issues. If interested, you can check them out at the following links:

Good luck as you embark upon your database performance tuning and management efforts!


Posted in performance | Leave a comment

Managing Database Application Performance

A database application requires constant interaction between various disparate computing resources to operate efficiently and according to specifications. Realistically, the tuning of a database application can be broken down into three components: system tuning, database tuning, and application tuning. All of these are related, and database performance management requires an integrated approach to tuning across each of these areas.

The DBA needs to understand how the DBMS interacts with the server hardware, the operating system, and any other required software. Tuning and configuring these components and connections properly can have a dramatic impact on system performance. Therefore, system tuning occurs at the highest level and can have the greatest impact on the overall health of database applications. The reason for this is that every database application runs on the overall system. No amount of tuning is going to help a database application when the server it is running on is short on resources or improperly installed. Memory configuration is perhaps the single most important aspect of system tuning. Without sufficient memory, the entire DBMS system can grind to a halt. And with today’s modern, in-memory database systems, memory management becomes even more crucial!

Another significant system bottleneck can be the database log. Every database modification is logged by the DBMS; an inefficient log can impact every application that modifies data. System tuning also includes managing and connecting other systems software with which the DBMS interacts, including the operating system, networking software, message queuing systems, middleware, and transaction processors.

Database application performance can be impacted by the physical design of the database, including normalization issues, how the database is stored on disk, number of tables, index design, and proper usage of DDL and its associated parameters. The actual physical location of database files on disk systems will have an impact on the performance of applications accessing the data. As more data is stored on the same disk device, the possibility of concurrent access and performance degradation increases. As data is inserted, updated, and deleted from the database, the efficiency of the database will degrade. Additionally, the files used to hold the data of the database may need to expand as data is added. Or perhaps, additional files, or file extents, will need to be allocated. Both disorganization and file growth can degrade performance. Indexes also need to be monitored, analyzed, and tuned to optimize data access, and to ensure that they are not having a negative impact on data modification.

The application code itself must be designed appropriately and monitored for efficiency. In fact, most experts agree that as much as 70 to 80 percent of performance problems are caused by improperly coded database applications. SQL is the primary culprit. Coding efficient SQL statements can be complicated. Developers need to be taught how to properly formulate SQL statements and SQL statements must be constantly monitored and tuned.

Programmers need to be schooled in the practice of examining SQL access paths. Every relational DBMS allows the programmer to request information on how the database optimizer will satisfy each query. Will an index be used? In what order will the tables be joined? Will the query be broken up into parallel tasks or not? These and many other factors influence the efficiency of SQL. Not all application problems are due to improperly coded SQL. The host language application code in which the SQL has been embedded also can be inefficient, causing database application performance to suffer.

Managing the performance of your database applications requires in-depth monitoring. Be sure to allocate an appropriate budget to acquire performance management tools to ensure the efficiency of your database systems.

Posted in DBA, performance, SQL, tools | Leave a comment

A Dozen SQL Rules of Thumb, Part 3

Today we pick up our three-part series of SQL rules of thumb (ROTs) with the third and final installment… You can think of these rules as general guiding principles you should follow as your write SQL statements… and we start off today’s post with rule #9…

Rule 9: Know What Works Best

The flexibility of SQL allows the same process to be coded in multiple ways. However, one way of coding usually provides better performance than the others. The DBA should understand the best way to code SQL for each DBMS in use. Furthermore, the DBA should provide information on proper query formulation for performance to the application development staff.

Keep in mind that these rules are DBMS-specific. By that I mean, one way of writing SQL might perform well on SQL Server but not on Oracle, which performs better with a different form of SQL returning the same data.

Rule 10: Issue Frequent COMMITs

When coding programs to run as batch transactions, it is important to issue regular SQL COMMIT statements. The COMMIT statement hardens modifications to the database. When a COMMIT is issued, locks on the modified database objects and data can be released. If you write programs that make a lot of changes, but do not issue periodic COMMITs, then you will be locking data and negatively impacting concurrent access to the data. I call this Bachelor Programming Syndrome (you know, fear of committing).

An additional consideration for Oracle DBAs is the impact of a COMMIT on the rollback segments. Rollback segments are used by Oracle to store completed transactions before the changes are actually written to the table. When you issue a COMMIT in Oracle, not only is the data finalized to the table but the contents of the rollback segment are removed, too. Oracle rollback segments are used to store before images of the data in case transactions are rolled back before changes are committed.

As a DBA you must ensure that application developers issue enough COMMIT statements to minimize the impact of locking on availability and (for Oracle) to keep rollback segments to a manageable size.

Rule 11: Beware of Code Generators

Beware of application code generators and similar tools that automatically create SQL. Many of these tools use gateways that require each SQL statement to be recompiled and optimized each time it is requested. However, some gateways provide a caching mechanism to store compiled and optimized SQL on the server. Such a cache can be help to improve performance for frequently recurring SQL statements.

Additionally, many code generators create accurate SQL, but not necessarily efficient SQL. So you might need to tweak the SQL that is generated (if that is even possible).

Rule 12: Consider Stored Procedures

Performance degradation due to repeated network traffic can be minimized by using a stored procedure because only a single request is needed to execute it. Within the stored procedure, multiple SQL statements can be issued, and the results processed and sent to the requesting program or user. Without the stored procedure, each of the multiple SQL statements, as well as all of the results, would have to be sent across the network. Additionally, SQL in stored procedures may perform better than the same SQL outside of the stored procedure if the DBMS parses and compiles the statements before run time.


These twelve SQL rules of thumb across three blog posts provide a sound basis for SQL development. Of course, they offer high-level guidance and are not the only things that you need to be aware of, and follow, as you strive to build efficient database applications.

So start here, but expand your knowledge base and keep learning about how you can write effective and efficient SQL for your database applications!


Posted in DBA, performance, SQL | 1 Comment

A Dozen SQL Rules of Thumb, Part 2

Catch Phrases 47

Today’s blog post picks up where we left off in our three-part series of rules of thumb (ROTs) that apply generally to SQL development regardless of the underlying DBMS.

These are the general guiding principles by which your SQL development should be guided… and we start off today’s post with rule #5…

Rule 5: Avoid Cartesian Products

Be sure to code predicates matching the columns of every table that will be joined within each SQL statement. Failure to do so will result in performance degradation and possibly incorrect results.

Whenever predicates do not exist when you are joining two tables, the RDBMS must perform a Cartesian product. This is the combination of every row of one table with every row of the other table. Non-matching rows are not eliminated because there is nothing that can be matched. The results of a Cartesian product are difficult to interpret and contain no information other than a simple list of all rows of each table munged together.

Of course, there are exceptions to this rule where you are specifically coding a Cartesian product for a business or technical reason. When coding a Cartesian product on purpose always make sure that you have investigated other methods of producing the same results and tested each method for performance and accuracy… and be sure to specifically test any query with a Cartesian product using production volumes of data!

Rule 6: Judicious Use of OR

The OR logical operator can be troublesome for performance. If you can convert a SQL statement that uses OR to one that uses IN, it is likely that performance will improve. For example, consider changing this SQL statement:

 SELECT e.position, e.last_name, e.empno, d.manager
 FROM   employee e,
        department d
 WHERE  d.dept_id = e.dept_id
 AND    position = ‘MANAGER’
 OR     position = ‘DIRECTOR’
 OR     position = ‘VICE PRESIDENT’
 ORDER BY position;

to this:

 SELECT e.position, e.last_name, e.empno, d.manager
 FROM   employee e,
        department d
 WHERE  d.dept_id = e.dept_id
 ORDER BY position;

Of course, your results may vary depending on the DBMS in use and the nature of the data.

Rule 7: Judicious Use of LIKE

The LIKE logical operator is another troublesome beast. It is very easy to create performance problems when using LIKE in SQL. For example, consider the following SQL statement:

 SELECT position, last_name, empno
 FROM   employee
 WHERE  dept_id LIKE ‘%X’
 ORDER BY position;

This query will return employee information for all employees working in any department where dept_id ends in ‘X’. However, the relational optimizer will have to scan the data in order to resolve this query—there is no way to use an index. Because the high-order portion of the column is not known, traversing a b-tree index structure is impossible.

You might be able to use your knowledge of the data to rewrite this query without a leading wild-card character (%). For example, perhaps all dept_id values start with either ‘A’ or ‘B’. In that case, you could modify the SQL as follows:

 SELECT position, last_name, empno
 FROM   employee
 WHERE  dept_id LIKE ‘A%X’
 OR     dept_id LIKE ‘B%X’
 ORDER BY position;


In this case, the DBMS may be able to use a non-matching index scan if an index exists on the dept_id column. Scanning the index may be more efficient than scanning the entire table, but even then you will be scanning data, which is not usually very efficient.

Once again, your results will vary with the DBMS in use and the nature of the data accessed.

Rule 8: Avoid Sorts When Possible

Sorting data is an inhibitor of optimal performance in SQL queries. Your DBMS will sort data as needed to satisfy your database requests. The types of operations that usually require sorting of some form are ORDER BY, GROUP BY, DISTINCT, UNION, INTERSECT, and EXCEPT. When performance is important, remember to look for sorts and find ways to eliminate them. You might be able to create an index to avoid sorting, or to use an alternate syntax if duplicate elimination is not important (e.g. UNION ALL versus UNION).

Keep in mind that sorting is an I/O intensive operation and can degrade query performance, sometimes significantly. When performance is important, remember to look for sorts and find ways to eliminate them.


And so we come to the end of part 2 in our 3 part series offering up 12 SQL rules of thumb… tune in next time for the final 4 rules (numbers 9 thru 12) in our dozen guidelines for developing effective and efficient SQL!

Posted in DBA, performance, SQL | 3 Comments

A Dozen SQL Rules of Thumb, Part 1

today’s blog post we will examine some rules of thumb that apply generally to SQL development regardless of the underlying DBMS. These are the general guiding principles by which your SQL development should be guided…

Rule 1: “It Depends!”

The answer to every question about database performance is “It depends.” A successful DBA (and programmer) will know on what it depends. For example, if someone asks, “What is the best access path for my SQL query?” the best answer is “It depends.” Why? Well, that is more difficult to answer.

If every row must be returned, a table scan is likely to be more efficient than indexed access. However, if only one row is to be returned, direct index lookup will probably perform best. For queries that return between one and all rows, the performance of access paths will depend on how the data is clustered, which version of the DBMS is in use, whether parallelism can be invoked, and so forth.

Be skeptical of tuning tips that use the words “always” or “never.” Just about everything depends on other things.

Rule 2: Be Careful What You Ask For

The arrangement of elements within a query can change query performance. To what degree depends on the DBMS in use and whether rule-based optimization is used.
A good rule of thumb, regardless of DBMS, is to place the most restrictive predicate where the optimizer can read it first. In Oracle, the optimizer reads WHERE clauses from the bottom up, therefore, the most restrictive predicate should be put at the bottom of the query. It is just the opposite in DB2.

Placing the most restrictive predicate where the optimizer can read it first enables the optimizer to narrow down the first set of results before proceeding to the next predicate. The next predicate will be applied to the subset of data that was selected by the most selective condition, instead of against the entire table.

And keep in mind that these things can change from release to release of a DBMS, so keep up with each new version, release and fixpack to make sure you understand what has been changed and how it might impact your SQL.

Rule 3: KISS

A rule of thumb for all types of IT activities is to follow the KISS principle: Keep it simple, Stupid. However, in the world of SQL there is a trade-off between simplicity and performance.

Keeping SQL simple makes development and maintenance tasks easier. A simple SQL statement is easier to decipher and easier to change. With simple SQL, application developers can perform their job more easily than with complex SQL.

Nevertheless, complex SQL can outperform simple SQL. The more work that can be performed by the DBMS and the optimizer, the better performance is likely to be. Let’s look at an example: Some programmers avoid joins by coding multiple SQL SELECT statements and joining the data using program logic. The SQL is simpler because the programmer need not understand how to write SQL to join tables. However, SQL joins usually outperform program joins because less data is returned to the program.

Furthermore, the relational optimizer can change the join methodology automatically if the database or data changes. Conversely, program logic must be changed manually by a skilled programmer.

Rule 4: Retrieve Only What is Needed

As simple as this rule of thumb sounds, you might be surprised at how often it is violated. To minimize the amount of data returned by your SQL statements, be sure to specify the absolute minimum number of columns in the SELECT list. If the column is not needed to satisfy the business requirement, do not request it to be returned in the result set.
Specify the absolute minimum number of columns in the SELECT list.

Programmers frequently copy SQL statements that work well to use as templates for new statements. Sometimes the programmer will forget to trim down the number of columns requested when they only need a subset of the columns in the original query. This can adversely impact performance. The more columns that must be returned by the DBMS, the greater the processing overhead.

Another common problem is requesting unnecessary data. Consider the following SQL statement:

SELECT position, last_name, empno
FROM   employee
WHERE  last_name = 'SMITH';

There is no reason to specify the last_name column in the SELECT list of this SQL statement. We know that last_name must be ‘SMITH’ for the entire result set because of the WHERE clause.

Returning only what is needed does not apply only to columns. You should also minimize the number of rows to be returned by coding the proper WHERE clauses for every SQL statement. The more data that can be filtered out of the result set by the DBMS, the more efficient the query will be because less data must be returned to the requester.

Sometimes application programmers avoid coding appropriate WHERE clauses in a misguided attempt to simplify SQL statements. The more information the optimizer has about the data to be retrieved, the better the access paths it formulates will be. A sure sign of potential abuse is finding a SQL statement embedded in an application program that is immediately followed by a series of IF-THEN-ELSE statements. Try to tune the query by moving the IF-THEN-ELSE statements into SQL WHERE clauses.


And so concludes the first four of our 12 SQL rules of thumb… tune in next time for the next 4 rules (numbers 5 thru 8) in our dozen guidelines for developing SQL that works and performs well…

Posted in DBA, Rules of Thumb, SQL | 1 Comment

Designing Applications for Relational Access

Application design is an important component of assuring relational database performance and efficiency. When performance problems persist it may become necessary to revisit application design. But redesigning and re-coding your applications can be time-consuming and costly, so it is better to properly address good design from the outset.

Design issues to examine when application performance suffers include:

  • Type of SQL. Is the correct type of SQL (planned or unplanned, dynamic or static, embedded or stand-alone) being used for this particular application?
  • Programming language. Is the programming language capable of achieving the required performance, and is the language environment optimized for database access?
  • Transaction design and processing. Are the transactions within the program properly designed to assure ACID properties, and does the program use the transaction processor of choice appropriately and efficiently?
  • Locking strategy. Does the application hold the wrong type of locks, or does it hold the correct type of locks for too long?
  • COMMIT strategy. Does each application program issue SQL COMMIT statements to minimize the impact of locking?
  • Batch processing. Are batch programs designed appropriately to take advantage of the sequential processing features of the DBMS?
  • Online processing. Are online applications designed to return useful information and to minimize the amount of information returned to the user’s screen for a single invocation of the program?
Posted in DBA, performance | Leave a comment

Transforming Mainframe Data Management

Just a quick post today to let my readers — specifically those in the US Midwest — that I will be presenting at an executive briefing at the River Forest Country Club in Elmhurst, IL  (Chicago suburb) on August 23, 2016. The event will address the challenges of data management for the digital economy and will look at issues such as:

  • Simplified data management with intelligent automation that requires no downtime.
  • Increased application performance.
  • Data integrity of structured and unstructured data.
  • The ability to easily manage growing databases.

My presentation will take a look at the trends and challenges being faced by data professionals as they attempt to manage the burgeoning amount of data that modern organizations collect, manage and turn into actionable intelligence. I will be joined at the event by other great speakers including Sherly Larsen and John Barry who will talk about new technology and innovations from BMC Software to meet those challenges, including BMC’s Next Generation Technology for DB2 utility management.

So if you live in the Midwest area, be sure to register for this event and check up on the latest data trends and techniques for dealing with modern data management on the mainframe

Posted in data, DB2, mainframe, tools | Leave a comment

Evaluating Database Performance Management Tools

I just completed a four part series of articles for TechTarget on database performance management and the different categories of tools are used for managing database performance. The general goal of the series is to bolster organizations’ understanding   of the issues involved with assuring database performance and to inform potential buyers of the considerations and decision points when choosing from the wide variety of products that are available in this field.

With that in mind, I am going to share links to these articles with my blog readers today. The first article, published in May 2016, defines database performance and clarifies What You Need to Know About Database Performance Software at a high level. This article offers a good introduction for those new to the topic and creates a template for moving forward with database performance solutions. It should be read first to understand the overall theme of the series of articles.

Part 2 was published in early June 2016, and it focuses on how to determine whether you need database performance management tools. The article, titled Three indicators that could signal database performance issues, focuses on the types of performance issues that you should look for and provides a brief overview of tools that can help manage those issues.

Next I took a look at the most important features of database performance monitoring and tuning technologies. Choosing between different technologies and vendors can always be a challenge unless you know your requirements and focus on the software that can best achieve the results you desire. Part 3 of this series, Examining the functions and features of database performance tools, focuses on helping organizations do that.

And finally, in Part 4, I focused on the actual vendors and products highlighting important considerations to evaluate when choosing a database performance management tools. Check this article out here –> Tips on selecting the right database performance tools.

The series of articles is accompanied by multiple, high-level vendor and product overviews of some of the leading database performance management providers that you might consider. Only a few of these have been published, so far, but there will be a total of eight overviews published in this series focusing on BMC, CA, IBM, Oracle, IDERA, SolarWinds, Dell, and Bradmark. (Links to all overviews added August 16, 2016).

Hopefully these articles and overviews offer a nice roadmap to success for your database performance management efforts.

Note: None of the vendors covered in this series of articles paid to be included in this series. The vendors covered were chosen as a representative sampling of the leading ISVs providing database performance tools.

Posted in DBA, performance, tools | Leave a comment

Data Modeling Concepts

Oftentimes organizations assign the DBA group the responsibility for data modeling, in addition to the more conventional physical database management tasks that are well-known DBA responsibilities. Of course, that does not mean that DBAs are well-trained in data modeling, nor does it mean that DBAs are best-suited to take on the task of data modeling. An organization that is serious about data will create a data administration or data architecture group whose responsibility it is to understand the organization’s data. Data administration (DA) separates the business aspects of data resource management from the technology used to manage data. When the DA function exists in an organization it is more closely aligned with the actual business users of data. The DA group is responsible for understanding the business lexicon and translating it into a logical data model.

That said, many organizations lump DA and DBA together into a DBA group. As such, the DA tasks usually suffer. One of these tasks is data modeling. There is a popular folk story about four blind men and an elephant that helps to illustrate the purpose of data modeling:

A group of four blind men happened upon an elephant during the course of their journey. The group of blind men had never encountered an elephant before, but they were a curious group. So each blind man attempted to learn about the elephant by touching it. The first blind man grabbed the elephant by the trunk and exclaimed, “Oh! An elephant is like a snake – it is a long, slithery tube.” The second blind man reached out and touched the side of the elephant and remarked “No, no, the elephant is more like a wall – very flat and solid.” The third blind man was confused so he reached out to touch the elephant but poked his hand on a tusk, and he said “No, you’re both wrong, the elephant is more like a spear than anything else!” The fourth blind man grabbed the elephant by the leg and shouted “You’re all wrong, an elephant is very much like a tree – round and solid.”

Well, each blind man was right, but he was also wrong. The problem was not with the experience of each blind man, but with the scope of their experience. To be a successful data modeler you must learn to discover the entire truth of the data needs of your business. You cannot simply ask one user or rely upon a single expert because his or her scope of experience will not be comprehensive. The goal of a data model is to record the data requirements of a business process. The scope of the data model for each line of business must be comprehensive. If an enterprise data model exists for the organization then each individual line of business data model must be verified against the overall enterprise data model for correctness.

An enterprise data model is a single data model that comprehensively describes the data needs of the entire organization. Managing and maintaining an enterprise data model is fraught with many non-database-related distractions such as corporate politics and ROI that is hard to quantify.

But back to the basic topic at hand – data modeling. Data modeling begins as a conceptual venture. The first objective of conceptual data modeling is to understand the requirements. A data model, in and of itself, is of limited value. Of course, a data model delivers value by enhancing communication and understanding, and it can be argued that these are quite valuable. But the primary value of a data model is its ability to be used as a blueprint to build a physical database.

When databases are built from a well-designed data model the resulting structures provide increased value to the organization. The value derived from the data model exhibits itself in the form of minimized redundancy, maximized data integrity, increased stability, better data sharing, increased consistency, more timely access to data, and better usability. These qualities are achieved because the data model clearly outlines the data resource requirements and relationships in a clear, concise manner. Building databases from a data model will result in a better database implementation because you will have a better understanding of the data to be stored in your databases.

Another benefit of data modeling is the ability to discover new uses for data. A data model can clarify data patterns and potential uses for data that would remain hidden without the data blueprint provided by the data model. Discovery of such patterns can change the way your business operates and can potentially lead to a competitive advantage and increased revenue for your organization.

Data modeling requires a different mindset than requirements gathering for application development and process-oriented tasks. It is important to think “what” is of interest instead of “how” tasks are accomplished. To transition to this alternate way of thinking, follow these three “rules”:

  • Don’t think physical; think conceptual – do not concern yourself with physical storage issues and the constraints of any DBMS you may know. Instead, concern yourself with business issues and terms.
  • Don’t think process; think structure – how something is done, although important for application development, is not important for data modeling. The things that processes are being done to are what is important to data modeling.
  • Don’t think navigation; think relationship – the way that things are related to one another is important because relationships map the data model blueprint. The way in which relationships are traversed is unimportant to conceptual and logical data modeling.

With all of the current hype surrounding Big Data and NoSQL and DevOps it is not uncommon for data modeling to become an afterthought, if it is even a “thought” at all. And that is too bad. Sure, it speeds up the development process… but what happens when somebody other than the developer wants to use the data? And use it perhaps in a different way or with a different access pattern?

The answer to being able to reuse data is proper data modeling and design.

Keep in mind that as you create your data models, you are developing the lexicon of your organization’s business. Much like a dictionary functions as the lexicon of words for a given language, the data model functions as the lexicon of business terms and their usage. Of course, this article just scrapes the tip of the data modeling iceberg. If you are a DBA with data modeling responsibilities I recommend that you find your way to a class, or if you cannot afford that, at least pick up a few good books on the topic.

Posted in data, data modeling, DBA | Leave a comment