The Never-Ending To-Do List of the DBA

If you are currently a DBA, I bet you can relate to the title of this article. Doesn’t it seem like there are always more things to do at the end of any given day?  The sad truth, though, is that many people do not know what a DBA is, does, or why they are even needed! Sometimes, that includes your manager!

So today we’re going to take a little time to hash through the primary responsibilities and tasks that DBAs are typically charged with performing. Not all of these tasks are needed every day, but some are, and many are of the utmost importance when they are needed.

So let’s start at the beginning… Every organization that manages data using a database management system (DBMS) requires a database administration group to oversee and assure the proper usage and deployment of the company’s data and databases. With the growing mountain of data and the need to organize that data effectively to deliver value to the business, most modern organizations use a DBMS for their most critical data. So, the need for database administrators (DBAs) is greater today than ever before. However, the discipline of database administration is not well understood or universally practiced in a coherent and easily replicated manner.

Implementing a DBA function in your organization requires careful thought and planning. A successful DBA must acquire a large number of skills — both technological and interpersonal. Let’s examine the skills required of an effective DBA.

General database management. The DBA is the central source of database knowledge in the organization. As such he must understand the basic rules of relational database technology and be able to accurately communicate them to others.

Data modeling and database design. The DBA must be skilled at collecting and analyzing user requirements to derive conceptual and logical data models. This is more difficult than it sounds. A conceptual data model outlines data requirements at a very high level; a logical data model provides in-depth details of data types, lengths, relationships, and cardinality. The DBA uses normalization techniques to deliver sound data models that accurately depict the data requirements of the business. (Of course, if your organization is large enough a completely separate group of data administrators may exist to handle logical database design and data modeling.)

Metadata management and repository usage. The DBA must understand the technical data requirements of the organization. But this is not a complete description of his duties. Metadata, or data about data, also must be maintained. The DBA must collect, store, manage, and provide the ability to query the organization’s metadata. Without metadata, the data stored in databases lacks true meaning. (Once again, if your company has a data administration group then this task will be handled by that group. Of course, that does not mean the DBA can ignore metadata management.)

Database schema creation and management. A DBA must be able to translate a data model or logical database design into an actual physical database implementation and to manage that database once it has been implemented. The physical database may not conform to the logical model 100 percent due to physical DBMS features, implementation factors, or performance requirements. The DBA must understand all of the physical nuances of each DBMS used by his organization in order to create efficient physical databases.

Capacity planning. Because data consumption and usage continue to grow at an alarming pace, the DBA must be prepared to support more data, more users, and more connections. The ability to predict growth based on application and data usage patterns and to implement the necessary database changes to accommodate that growth is a core capability of the DBA.

Programming and development. Although the DBA typically is not coding new application programs, s/he does need to know how to write effective programs. Additionally, the DBA is a key participant in production turnover, program optimization (BIND/REBIND) and management, and other infrastructure management to enable application programs to operate effectively and efficiently.

SQL code reviews and walk-throughs. Although application programmers usually write SQL, DBAs are likely to be blamed for poor performance. Therefore, DBAs must possess in-depth SQL knowledge so they can understand and review SQL and host language programs in order to recommend changes for optimization.

Performance management and tuning. Dealing with performance problems is usually the biggest post-implementation nightmare faced by DBAs. As such, the DBA must be able to proactively monitor the database environment and to make changes to data structures, SQL, application logic, and the DBMS subsystem itself in order to optimize performance.

Ensuring availability. Applications and data are more and more required to be up and available 24 hours a day, seven days a week. Globalization and e-business are driving many organizations to implement no-downtime, around-the-clock systems. To manage in such an environment, the DBA must ensure data availability using non-disruptive administration tactics.

Data movement. Data, once stored in a database, is not static. The data may need to move from one database to another, from the DBMS into an external data set, or from the transaction processing system into the data warehouse. The DBA is responsible for efficiently and accurately moving data from place to place as dictated by organizational needs.

Backup and recovery. The DBA must implement an appropriate database backup and recovery strategy for each database file based on data volatility and application availability requirements. Without a backup and recovery strategy, system and user errors could render a database inoperable and useless. Furthermore, the backup strategy must be developed with recovery time objectives in mind, so that data is not unavailable for long periods when problems inevitably occur. This is probably one of the, if not the absolute, most important database administration task.

Ensuring data integrity. DBAs must be able to design databases so that only accurate and appropriate data is entered and maintained. To do so, the DBA can deploy multiple types of database integrity including entity integrity, referential integrity, check constraints, and database triggers. Furthermore, the DBA must ensure the structural integrity of the database. Data integrity is right up there with backup and recovery in importance level.

Procedural coding and debugging. Modern databases are comprised of more than just data — they also contain program code. The DBA must possess procedural skills to help design, debug, implement, and maintain stored procedures, triggers, and user-defined functions that are stored in the DBMS and used by application systems.

Extensible data type administration. The functionality of a modern DBMS can be extended using user-defined data types. The DBA must understand how these extended data types are implemented by the DBMS vendor and be able to implement and administer any extended data types implemented in their databases.

Data security. The DBA is charged with the responsibility to ensure that only authorized users have access to data. This requires the implementation of a rigorous security infrastructure for production and test databases. Data security comprises both DBMS security (revoke/grant) and security on external resources (file structures, userids, and so on).

Database auditing. Being able to report on who did what to which data when, along with how they acted upon that data, is a requirement for many governmental and industry standards and compliance specifications. DBAs need to be involved in terms of setting up and enabling the DBMS for database auditing capabilities.

General systems management and networking. After a database is implemented it will be accessed throughout the organization and interact with other technologies. Therefore, the DBA has to be able to function as a jack of all trades in order to integrate database administration requirements and tasks with general systems management requirements and tasks (like job scheduling, network management, transaction processing, and so on).

Business knowledge. DBAs must understand the requirements of the application users and be able to administer their databases to avoid interruption of business. Without a firm understanding of the value provided to the business by their databases and data, the DBA is not likely to be able to implement strategies that optimize the business’s use of that data.

Data archiving. When data is no longer needed for business purposes but must be maintained for legal purposes, the data needs to be removed from the operational database but stored in such a way that it remains accessible for e-discovery and legal requirements. This is database archiving.

Enterprise resource planning (ERP). Enterprise Resource Planning (ERP) software packages place additional burdens on the DBA. Most ERP applications (SAP, Peoplesoft, etc.) use databases differently than homegrown applications, requiring DBAs to know how the ERP applications impact the business and how the databases used by those packages differ from traditional relational databases.

Web-specific technology expertise. For e-businesses, DBAs are required to have knowledge of Internet and Web technologies to enable databases to participate in Web-based applications. Examples of this type of technology include HTTP, FTP, XML, CGI, Java, TCP/IP, Web servers, firewalls, and SSL. Other technologies that fall into this category include database gateways and APIs.

Storage management techniques. The data stored in every database resides on disk somewhere (unless, perhaps, it is stored using an in-memory DBMS). The DBA must understand the storage hardware and software available for use, and how it interacts with the DBMS being used. As such, DBAs must be able to allocate, monitor, and manage the storage used by databases.

Summing Up…

The bottom line is that the DBA must be a well-rounded staff member capable of understanding multiple facets of the business and technology. The DBMS is at the center of today’s IT organization — so as the one tasked with keeping the DBMS performing as desired, the DBA will be involved in most IT initiatives.

Did I forget anything?

Posted in DBA | Leave a comment

Happy New Year 2022

Just a short post today to wish my readers a very happy 2022! I hope that you have enjoyed the holiday season and that the upcoming year will be filled with joy and prosperity.

Happy New Year 2022

It also looks like 2022 will see a return to in-person conferences, and I am looking forward to that. I plan on going to Dallas for SHARE in March and to Boston for IDUG in July, as well as others perhaps. Of course, a lot will depend on the COVID situation and how these events manage socially distancing and masking. Personal safety is still much more important than anything else.

And if you are going out tonight to celebrate, please do so safely and courteously. Wear your masks as appropriate and don’t impinge on other’s space. And don’t drink too much, especially if you plan on driving later!

So as we bid adieu to 2021, and look forward to 2022, I hope we will get the chance to see each other again… but safely!

Happy New Year!

Posted in Happy New Year | 1 Comment

Heterogeneous Database Management with Navicat Premium 16

Navicat Premium is a heterogeneous database development and management tool that makes it easy to quickly build, manage, and maintain your databases. With Navicat Premium you can simultaneously connect to heterogeneous database systems. Support is available for MySQL, MariaDB, Microsoft SQL Server, MongoDB, Oracle, PostgreSQL, and SQLite databases, and it is compatible with cloud databases like Alibaba Cloud AsparaDB, Amazon RDS, Amazon Aurora, Amazon Redshift, Microsoft Azure, Oracle Cloud, Google Cloud, and MongoDB Atlas.

Figure 1. Main screen of Navicat Premium (Windows)

The product can be used as the hub of operations for managing your disparate database environments with capabilities for database design and implementation, query building and execution, data migration and synchronization, data visualization, data generation, and even analysis capabilities for improving your databases and queries. So Navicat Premium is an ideal solution for DBAs that are required to manage multiple different DBMS products and installations. One small issue to keep in mind is that Navicat, unfortunately, does not offer support for IBM’s Db2.

One of the nice, long-term features of Navicat Premium is its cross-platform UI support, offering support for Windows, maxOS, and multiple Linux distributions (e.g. Debian, Fedora, Ubuntu, and others). Although Navicat has been available for some time now, the latest and greatest version was released in late November 2022. So, let’s take a look at what’s new.

What’s New

If you have used Navicat Premium before then one of the first things you’ll probably notice is the refreshed user interface (refer to Figure 2 for a screen shot of the Linux UI). All of the buttons and icons have been enhanced and modified to improve the user experience. But importantly the user flow has not changed so the same sequence of actions and commands can be used. If there is one thing I hate it is when user flow changes from release to release for no apparent reason, but Navicat has done well here.

Additionally, many existing features, such as Connection Profile, Query Summary, and Value Picker have been updated to increase the overall efficiency of your database development.

Figure 2. A screen shot of the Linux UI

But there are several additional nice, new features in Navicat Premium 16, such as the new data generation tool. As any database developer will tell you, creating and managing appropriate data for testing applications is one of the more frustrating aspects of database programming and testing. Coming up with reasonable data, especially for brand new applications where no data exists anywhere, is a chore. And copying production data is not always possible (or legal). So how is test data created? You cannot simply just churn out random text; the data has to match the data types defined in the database. Furthermore, referential integrity constraints and business rules must be understood and adhered to in order to create proper test data that fully works out the application code.

Fortunately, the new data generation tool provided in Navicat Premium offers a comprehensive range of functions to generate a large volume of quality testing data. You can rapidly create realistic data sets with referential integrity based on business rules and constraints.

Figure 3. A screen shot of data generation (macOS UI)

Navicat Premium 16 drives that data generation with a wizard that walks you through the process of choosing tables in the proper order. The test data that it generates will be displayed so that you can view it, edit it if needed, or even regenerate it again. Navicat’s test data generation capability can save development teams a lot of time and effort.

But that’s not all. Working together in teams, or collaboration has become an increasingly important aspect for both database developers and DBAs, especially with the growing importance of DevOps. Although there are many aspects of DevOps (and the purpose of this piece is not to define it), the core underlying principle of DevOps is to improve the way your team works together throughout the entire software development lifecycle.

Although past versions of Navicat have made it easier for teams to collaborate in past versions, Navicat Premium 16 improves upon things by adding support for Charts and Code Snippets to the Navicat Cloud. Using the Navicat Cloud Portal teams can manage their files and projects, but also monitor cloud services using a single interface. These types of collaboration features help teams as they embrace DevOps practices and procedures.

The next big advance in Navicat Premium 16 is in the form of improved data visualization capabilities. Again, data visualization is not brand new to Navicat Premium, as the ability to chart data was previously available. But there are additional chart types and new functions included with the new release. Navicat Premium 16 supports more than 20 different types of charts. And you can use it to visualize live data.

Figure 4. Navicat Charts Workspace

You can connect to any data source and also extend your data with customized fields by changing field types, concatenating fields, mapping values, or sorting based on another field order. Furthermore, Navicat Premium 16 improves the usability and accessibility of charting with the dashboard, where you can share your charts across your organization.

There is also a new approach for resolving conflict files in the Navicat Cloud solutions. Cloud management is simplified because you can now discard the server file and keep your file, discard your copy without saving changes, or rename your copy to keep both files.

Other helpful new capabilities include:

Connection Profile, which can be used to configure multiple profiles for users who may need to switch between settings based on their location. This is especially useful with the increase in the number of people working from home (or outside of their traditional office).

Figure 5. Connection Profiles

Query Summary, which can be used to produce a detailed summary of each SQL statement. It is an easy-to-read, one-page summary for the health and performance of your queries with a shortcut to jump over to the potential errors.

Figure 6. Query Summary

Field Information, which delivers a quick view of column characteristics for reviewing information between columns within the Table Viewer.

Figure 7. Field Information

Summary

The latest version of Navicat offers up some nice features that make it easier to manage, use, and administer your heterogeneous database environment. Keep in mind that this overview examines the new features of version 16 and does not provide comprehensive detail of all the features and functionality of Navicat Premium. It really does offer a bevy of useful capabilities! So, if you are looking for a feature-laden tool for managing database development and management, you should take a look at Navicat Premium to simplify your effort. You can download a free trial here.

Posted in cloud, DBA, DBMS, Microsoft SQL Server, MySQL, SQL | Leave a comment

A DBA By Any Other Name…

Every organization that manages data using a database management system (DBMS) requires a database administration (DBA) group to ensure the effective use and deployment of the company’s databases. And since most modern organizations of every size use a DBMS, most organizations have DBAs or at least people who perform the ongoing maintenance and optimization of the database infrastructure.

The DBA, traditionally, is the information technician responsible for ensuring the ongoing operational functionality and efficiency of an organization’s databases and the applications that access that data. A day in the life of a DBA is usually quite hectic. The DBA is required to maintain production and test environments while at the same time keeping an eye on active application development projects, attending strategy and design meetings, helping to select and evaluate new products, and connecting legacy systems to the web. And Joe in Accounting, he just submitted that “query from hell” again that is bringing the system to a halt, can you do something about that? All of these things can occur within a single DBA workday.

When application problems occur, the database environment is frequently the first thing blamed. The database is “guilty until proven innocent” instead of the other way around. It is rare that a DBA is approached with a question like “I’ve got some really bad SQL here, can you help me fix it?” Instead, the DBA is charged with seeking out the true culprit when the DBMS is blamed. Because DBAs are frequently forced to prove that the database is not the source of problems, s/he must know enough about all aspects of IT to track down the root cause of problems – and help to ensure corrections are made.

For this reason (and many more), DBAs are usually relied upon to do far more than just stoke the fires to keep database systems performing. Most DBAs have years of IT experience and are asked to share their expertise on related technologies (such as application development, middleware implementation, transaction processing, and networking) with project teams. In some small to medium-sized shops, an experienced full-time developer may be tasked with performing DBA duties “on the side.” Couple all of this with the fact that the discipline of database administration is not well understood or universally practiced in a coherent and easily replicated manner, and you get where I am headed with this. A DBA is not always just a DBA.

Oftentimes, DBAs are expected to know everything about everything. From technical and business jargon to the latest management and technology fads, the DBA is expected to be “in the know.” And don’t expect any private time. A DBA must always be prepared to be interrupted at any time to answer any type of question – and not just about databases, either.

Some of this is less than desirable, but some of it is actually a good thing. At least, that is, from the perspective of DBAs. By expanding the role of the DBA to include other technologies and even business issues, the DBA can become a more well-rounded and valuable employee. Yes, it is a great thing to have experienced techies who can delve into and solve complex problems. But it is an even greater thing for these techies to be able to communicate, to understand the business, and to extend their abilities more diversely than just on database tactics and issues.

So are today’s DBAs really DBAs? Or has the job grown into something more? And, if so, just what?

Perhaps it is not possible to peg a true definition of the modern DBA role because each organization may impose different requirements on it. At some shops, DBAs need to remain very technical but embrace new technologies. At others, DBAs need to adopt a data governance hat, becoming better versed on the meaning of the data. And at others, DBAs are mandated to become more active in the application development lifecycle, especially with the growth of DevOps. And who gets involved in evaluating and implementing all the new NoSQL and big data management platforms? Yup, DBAs. And then there are always those organizations that lump all of this onto the DBA.

With all of this in mind, might there be a more descriptive title for the modern DBA? Data Czar? Too imperial. Data Generalissimo? Too militaristic. Data Conductor? Too directorial (or musical). Data Analyst? There are already people with that title, but let’s not open that particular Pandora’s Box to try to define exactly what they do!

Maybe we should just stick with DBA… after all, most of us know what that means, even if it is not truly an accurate descriptor of the job anymore.

Long live the DBA!

Posted in data, DBA, DBMS | Leave a comment

Avoid Bachelor Programming Syndrome

Unless you plan for, and issue, regular COMMIT statements in your database programs, you will be causing locking problems. It is important for every programmer to issue COMMIT statements in all application programs where data is modified (INSERT, UPDATE, MERGE, and DELETE). A COMMIT externalizes the modifications that occurred in the program since either the beginning of the program or the last time a COMMIT was issued. A COMMIT ensures that all modifications have been physically applied to the database, thereby ensuring data integrity and recoverability.

Failing to code COMMITs in a data modification program is what I like to call “Bachelor Programming Syndrome”—in other words, fear of committing.

Bachelor Programming Syndrome: Fear of Commiting


One important factor affecting the need for a COMMIT strategy is the amount of elapsed time required for the program to complete. The greater the amount of time needed, the more important it becomes to issue periodic COMMITs. A sound COMMIT strategy will reduce rollback time and reprocessing time in the event of program failure. It is a safe assumption that the elapsed time increases as the number of modifications increases.

Issuing COMMITs in your application programs is vitally important for three primary reasons:

1) If the program fails, all the modifications are backed out to the last COMMIT point. This process could take twice the time it took to perform the modifications in the first place—especially if you are near the end of a program with no COMMITs that performed hundreds or thousands of modification operations.

2) If you resubmit a failing program that issues no COMMITs, the program redoes work unnecessarily.

3) Programs using the repeatable read page locking strategy or the RELEASE(COMMIT) table space locking strategy hold their respective page and table space locks until a COMMIT is issued. If no COMMITs are issued during the program, locks are not released until the auto-COMMIT when the program completes, thereby negatively affecting concurrent access. This can cause lock timeouts and lock escalation.

If concurrent batch or online access uses uncommitted read isolation, sometimes referred to as dirty reads, COMMIT processing is irrelevant. This is so because uncommitted read isolation does not take any locks. However, most processing requires accurate data and, as such, will not use uncommitted read isolation.

If your program is running for a long time and has issued no COMMITs, your program will probably not suffer. However, others attempting to access the data that is locked by your program will suffer. This will manifest itself with lock timeout errors (such as -911 and -913 SQL return codes). If you see a lot of lock timeouts occurring it is probably a good bet that you have at least one long-running program (maybe more) that needs to have COMMIT logic added.

It is also good programming practice to issue periodic COMMITs in long-running read-only programs. Even though data is not being changed, some databae management systems, such as DB2 for z/OS, acquire read locks whenever data is accessed. A read lock can block other programs (such as utility operations) from running successfully. Therefore, failure to COMMIT, even in a read only program, can result in locking problems.

In some rare circumstances, you might be able to avoid issuing COMMITs in your database programs. But you should only consider doing so for programs that run very quickly, and therefore will not hold locks for a long duration.

I recommend that you plan to issue COMMITs in every batch program. You can structure the logic so that the COMMIT processing is contingent on a parameter passed to the program. This approach enables an analyst to adjust (or turn off) COMMIT processing but ensures that all batch programs are prepared if COMMIT processing is required in the future.

Posted in DBA | Tagged , , , , | Leave a comment

Managing Databases with Autonomics

Many database management tasks that traditionally required oversight and handholding by DBAs can, over time, be turned over to intelligently automated software to manage. The key words in that sentence are “over time” and “intelligently.” Oh, sure, we have automated things for ages, whether it be generated scripts, a scheduled reorganization or backup job, or a series of scripts to make changes to database structures.

By developing computerized applications to support business processes, we automate just about every task in our organizations. But, for the most part, these automated steps still require significant manual interaction or intervention.

Autonomics is the migration from simple automation to managing your databases with intelligent software techniques.

Autonomics

So what are autonomics? Autonomic computing refers to the self-managing characteristics of distributed computing resources, adapting to unpredictable changes while masking the complexity from operators and users. Autonomics implies much more than simple automation.

The goal of autonomics is to enable computer systems, in this case the DBMS, to be capable of managing itself. Autonomics requires an awareness of the environment, knowledge of changing usage patterns and resources, and the ability to adapt to shifting requirements. Through autonomics it is possible to optimize manual tasks and reduce the growing levels of complexity. This is particularly important in the current day and age of running lean IT staffs.

At a high-level, there are four aspects of autonomic computing:

  1. It is automatic, meaning it can make decisions on its own, using performance and usage metrics and high-level policies developed by administrators.
  2. It is adaptive, meaning it can automatically adjust to changing conditions.
  3. It is aware, meaning the system can monitor (or sense) its operational context as well as its current state to determine if it is reaching its specific purpose.
  4. It is self-managing, meaning it can optimize and administer itself without human interaction being required.

Of course, there are many components of the “self” management aspect of autonomics. Autonomic systems can support many different types of self-management capabilities, such as:

  • Self-configuration: Automatic configuration of the system and its components;
  • Self-healing: Automatic discovery, and correction of faults;
  • Self-optimization: Automatic monitoring and control of resources to ensure the optimal functioning with respect to the defined requirements;
  • Self-protection: Proactive identification and protection from arbitrary attacks.
  • Self-inspection: Understands itself and its interactions with other systems in order to make intelligent decisions;
  • Self-organization: Proactive modification of data structures and organization to optimize access.

As with most things, it is not reasonable to expect organizations to jump right into a full-blown implementation of autonomics. There are 5 levels of gradation for autonomic computing that can be adopted. These can be summarized as:

  1. Basic
  2. Managed
  3. Predictive
  4. Adaptive
  5. Fully autonomic

The first three levels are passive, meaning that the system advises an administrator that corrective actions are needed, but nothing is automatically performed. The Basic level is essentially through human effort, IT professionals are used to manage the appropriate tasks through their own abilities and experiences. The Managed level augments human effort with management technologies and techniques. And the third level is where we really start to see a more autonomics-focused approach. This Predictive level introduces new techniques and methodologies for correlation among different components.

Then we move into the active levels, where corrective actions are automatically taken. The Adaptive level uses the metrics and information it has gathered to automatically take actions as needed. And the final level, the Autonomic level, actively monitors and analyzes business policies and objectives – and can even change these policies and objectives based on new observations. Basically, it all boils down to identification of issues vs. not just identifying the issues but also taking action to remediate the issues.

Trust, but Verify

An evaluation of the accuracy of the autonomic capabilities by trusted advisors is an important part of the continuum of moving to autonomics. It never fails that when a technician is first exposed to autonomics, skepticism is aroused. For example, a DBA told that some piece of software is going to make automatic adjustments to parameters, jobs or data will usually say something like “Wait-a-minute! First tell me what the problem is, along with what you suggest I should do.”

The DBA wants to look at the recommendations and develop a level of trust in the technology. That is the first phase for autonomic software. You can break problem solving down into three high-level steps:

  1. Find the problem
  2. Analyze the situation for a solution
  3. Implement the solution

DBAs readily accept automating the first two steps but the last one usually takes some time. After a period of time reviewing the proposals raised by the software, a technician will begin to trust a well-written and developed autonomic solution. At that point it can be turned on as fully autonomic… not just finding the problem, and performing analysis to find a solution, but also automatically fixing the problem.

Modern Database Administration Requires Autonomics

Today’s IT world is defined by growing amounts of data, ever-changing DBMS functionality and heterogeneous environments. Couple that with retiring mainframe expertise and few new DBAs being hired and it is easy to see that managing to best practices becomes untenable.

Autonomics is the key to being able to manage a modern database implementation. We see vendors incorporating more and more autonomics features into the base DBMS functionality. This can be seen in features performance profiles, real time statistics, performance feedback, and even machine learning being incorporated into the database optimizer.

And you can augment whatever type of autonomics available in the DBMS with sophisticated tools and utilities that take advantage of the features and metrics provided within your DBMS to constantly monitor, manage and optimize your applications and databases.

Posted in DBA | Tagged , , , , | Leave a comment

Use a Little SQL Magic to Clean Up Your Data

Data integrity sometimes can be a vexing problem to overcome. Data gets brought into our systems in many different ways: through data entry clerks, online by customers, purchased data, ETL, and on and on. And there is no guarantee, many times, that the data is stored consistently, even if we need it to be. For example, some data is stored with trailing blanks, some with trailing spaces, some with strange character that we have no idea how they go there!

Given that this is the reality of data these days, what can we do? Well there are some data integrity issues that can be cleaned up using a touch of crafty SQL. Consider the common data entry problem of extra spaces in a name field. Not only is it annoying, sometimes it can cause the system to ignore relationships between data elements. For example, “Craig  Mullins” is not equivalent to “Craig Mullins” — the first one has two spaces between the first and last name, but the second one only has one space. Such problems can be difficult to spot with the naked eye, but if you are relying on code to match the two a simple equivalency test will not work. Sure, there are software tools that you can buy that help to clean up data, but what if you don’t have the budget for that?

Consider Using Trusty Old SQL

With a little SQL code you can write an UPDATE statement to fix such problems… that is, if you know how to use the REPLACE function. REPLACE does what it sounds like it would do: it reviews a source string and replaces all occurrences of a one string with another. For example, to replace all occurrences of Z with A in the string RUTZBZGZ you would code:

REPLACE ('RUTZBZGZ', 'Z', 'A')

And the result would be RUTABAGA. But a simple REPLACE is not sufficient for the task at hand, so let’s create a SQL statement to get rid of any number of unwanted spaces in the NAME column of our EMPLOYEE table:

     UPDATE EMPLOYEE
            SET NAME = REPLACE(
                          REPLACE(
                              REPLACE(NAME, SPACE(1), '<>')
                              '><', SPACE(0))
                          '<>', SPACE(1));

What are all of those less-thans and greater-thans doing in this SQL? And why do you need them? Well, let me explain.

The innermost REPLACE statement takes the NAME column and converts every occurrence of a single space into a left/right carat. The next REPLACE (working outward), takes the string we just created, and removes every occurrence of a right/left carat combination by replacing it with a zero length string. The final REPLACE function takes that string and replaces any left/right carats with a single space. The reversal of the carats is the key to removing all spaces except one – remember, we want to retain a single space anywhere there was a single space as well as anywhere that had multiple spaces. If you’ve got this type of data problem give it a try… it works.

Of course, you can use any two characters you like, but the left and right carat characters work well visually. Be sure that you do not choose to use characters that occur naturally in the string that you are acting upon, though! I have yet to meet anybody with a < or > sign in their name, so those should work well.

Finally, the SPACE function was used for clarity. You could have used strings encased in single quotes, but the SPACE function is easier to read. It simply returns a string of spaces, the length of which is specified as the integer argument.

Summary

This is just one type of data integrity problem that you can clean up using SQL and functions. With a little ingenuity and some adept coding you can probably do a lot of data cleansing using nothing but SQL. If you have a favorite SQL data cleansing method, help us out and share it as a comment here on the blog.

Posted in DBA, SQL | Leave a comment

World Standards Day

Today, October 14, 2021 is World Standards Day and as a data management consultant I just have to recognize today as an important one.

Even though it has been said that “the best thing about standards is that there are so many to choose from,” a concise, well-stated set of standards and procedures for using database systems is important for ensuring smooth operations and support.

Standards are common practices that ensure the consistency and effectiveness of the database environment, such as database naming conventions. Procedures are scripts that direct the processes required for handling specific events, such as a disaster recovery plan. Failure to implement database usage standards and procedures will result in a database environment that is confusing and difficult to manage.

Database standards and procedures should be developed by the DBA in partnership with application and systems teams. The standards should be stored in a central, accessible place. The database standards and procedures should be a component of corporate-wide IT standards and procedures. Although they can be stored as a printed document, it is better to make them available in an online format for easy access, typically on an intranet site. Several vendors offer “canned” standards and procedures that can be purchased for specific DBMS products.

Naming Guidelines: One of the first standards to be implemented should be a set of guidelines for the naming of database objects. Without standard database object naming conventions, it will be difficult to identify database objects correctly and to perform the proper administration tasks.

Database object naming standards should be developed in conjunction with all other IT naming standards in your organization. In all cases, database naming standards should be developed in cooperation with the corporate data administration department (if one exists). Wherever possible, the database naming conventions should peacefully coexist with other IT standards, but not at the expense of impairing the database environment.

The database object naming standard should be designed to minimize name changes across environments. In general, do not impose any unnecessary restrictions on the names of database objects. Relational databases are supposed to be user-friendly. A strict database naming convention, if not developed logically, can be antithetical to a useful and effective database environment.

Roles and Responsibilities: The successful operation of a DBMS requires the coordinated management efforts of many skilled technicians and business experts. A matrix of database management and administration functions should be developed that documents each support task and who within the organization provides the support. The matrix can be created at a departmental level, a job description level, or even by individual name.

Your organization can create whatever tasks you deem necessary in your roles and responsibilities matrix. Whatever the final composition of your roles and responsibilities matrix, be sure to maintain it by keeping it accurate and up-to-date with new DBMS features and tasks, as well as new roles (as required) and employees. An up-to-date matrix makes it easier to define roles within the organization and to effectively apportion database-related workload.

Administration Standards: If a data administration (DA) group exists within your organization a basic DA standards guide should be developed to outline the scope of their job role. If a DA group does not exist, be sure to include DA standards in the DBA standards as appropriate.

The data administration standards should provide a clear statement of the organization’s overall policy with regard to data including its importance to the company, as well as guidelines for establishing data ownership and stewardship. Other issues that need to be documented include rules for data creation, meta-data management policy, conceptual and logical data modeling guidelines, and the organization’s goals with regard to creating an enterprise data model.

A basic set of database administration standards should be established to ensure the ongoing success of the DBA function. The standards will serve as a guide to the DBA services offered and specific approaches taken to support the database environment. For example, standards can be developed that outline how requests are made to create a new database or make changes to existing databases, specify which types of database objects and DBMS features are favored and under which circumstances to deviate, establish backup and recovery procedures (including disaster recovery plans) and communicate the methods used to transform a logical data model into a physical database implementation. An additional set of DBA standards that cover database performance monitoring and tuning may be useful to document procedures for overcoming performance problems.

Although the DBA standards will be most useful for the DBA staff, the application development staff will need them to learn how best to work with the DBA staff. It is also a good idea to create a standard way to request DBA services using a service management tool to request and track all work using ticketing.

Furthermore, any performance tuning tricks that are documented in the DBA standards should be shared with programmers, too. The more the application programmers understand the nuances of the DBMS and the role of the DBA, the better the working relationship between DBA and development will be – and that should result in a more efficient database environment.

Standards for system administration or systems programming are required if your organization separates the SA function from the DBA function. System administration standards are needed for many of the same reasons that DBA standards are required. Standards for SA may include DBMS installation and testing procedures, upgrade policies, maintenance procedures, and so on.

Database Security: The DBA unit often applies and administers DBMS security. However, at some shops, the corporate data security unit handles DBMS security. You should provide a resource outlining the necessary standards and procedures for administering database security. It should contain the following information:

Details on what authority to grant for specific types of situations.

An authoritative list of who can approve what types of database authorization requests.

Information on any interfaces being used to connect DBMS security with operating system security products.

Auditing requirements and implementation guide (especially important in this day and age of regulatory compliance).

Procedures for notifying the requester that database security has been granted.

Procedures for removing security from retiring, relocating, and terminated employees.

Application Standards: The development of database applications differs from typical program development. You should take care to document the special development considerations required when writing programs that access databases. The database application development standards should function as an adjunct to any standard application development procedures within your organization. 

Additionally, application migration and turnover procedures are needed to support development and production environments – along with when, and how, programs are moved between the two. Specific guidelines are needed to accomplish migration in a manner conducive to the usage of each environment. For example, what data volume is required for each environment and how is data integrity to be assured when testing activity occurs? Should data be migrated, or just the database structures?

The migration and turnover procedures should document the information required before any database object or program can be migrated from one environment to the next. At a minimum, information will be required about the requester, why and when the objects should be migrated, and the appropriate authorization to approve the migration.

Additional standards and procedures that may be required include design review guidelines for proceduralizing the approval of applications as they are built, and operational support standards to ensure the efficient and proper operation of the database environment.

At any rate, you can use the cases above as a starting point for assembling a reasonable set of database standards and procedures. Keep in mind, though, that the above just skims the surface of what should be included in a thorough database standards and procedures guide. Additional research will be required to ensure that your standards are proper and complete.

Summary

So, take a moment today, on World Standards Day, to review and examine your shop’s database standards. Are they complete and up-to-date? Is there anything that needs to be modified (or even removed)? I think it makes sense to make it a standard to review your standards every year on Wo

Posted in DBA, standards | Leave a comment

Mainframes in the Movies

As an old mainframe IT guy, I always cringe when I’m watching a movie or TV show and I hear the word “mainframe.” Invariably it is used in the wrong context or shows a screen that is definitely not a mainframe screen.

So when I came across this YouTube video of “mainframe” references in movies I just had to share it here.

The Mainframe Goes to the Movies

After watching this I have to say, at least Kathy Bates gets it right!

Posted in mainframe | Leave a comment

Happy DBA Day 2021!

In case you didn’t know it, the first Friday in July is DBA Appreciation Day!

I assume that most of my readers know what a DBA is, but for those of you reading who do not, you probably benefit from having DBAs around. A DBA, or database administrator, is the information technician responsible for directing or performing all activities related to maintaining a successful database environment.

If you can access your data and your systems are operating smoothly, you have your DBAs (at least in part) to thank for that. Most of the time, people don’t know what the DBAs do, until a problem occurs that requires their attention. In other words, if problems aren’t occurring, you can thank your DBAs for being vigilant and doing their job.

How can you show your appreciation? A simple “thank you” goes a long way. But if you still want some ideas, here are a few:

…Get them a cup of their favorite beverage. You know, something other than that brown swill that is available in your office!

…Donuts are appreciated by some; cut veggies by others. I have to say though, usually not by the same DBA! However, it is typically not a problem to identify which is which.

…Perhaps the best “gift” you can give your DBA is this though: the next time you are thinking about picking up the phone (or sending an email) to contact the DBA, make sure you have investigated the problem thoroughly yourself. And that you have all the details needed to help the DBA help you with your issue.

So have a great day all you DBAs out there…

I certainly appreciate you!

Posted in DBA | Leave a comment