DevOps and the Database

The title of this blog post was also the title of my presentation at last year’s Data Summit event. Here is a short snippet of that presentation captured at the event.

 

After watching it, let me know your thoughts on this topic!

Posted in DBA, DevOps | Leave a comment

World Backup Day and Your Databases

Today, March 31, 2020, is World Backup Day. So I thought I’d take the time to blog a bit on the importance of preparing and testing your database backup and recovery plans.

The recoverability of your organization’s data is perhaps the most important thing that a DBA must do. As such, establishing a reasonable backup schedule for your databases can be a challenging project. It requires you to balance two competing demands: the need to take image copy backups frequently enough to assure reasonable recovery time, and the need to not interrupt daily business. The DBA must be capable of balancing these two objectives based on usage criteria and the capabilities of the DBMS.

Not all data is created equal. Some of your databases and tables contain data that is necessary for the core of your business. Other database objects contain data that is less critical or easily derived from other sources. Before you can set up a viable backup strategy and schedule, you will need to analyze your databases and data to determine their nature and value to the business. To do so, answer the following questions for each database object.

  • How much daily activity occurs against the data?
  • How often does the data change?
  • How critical is the data to the business?
  • Can the data be recreated easily?
  • What kind of access do the users need? Is 24/7 access required?
  • What is the cost of not having the data available during a recovery? What is the dollar value associated with each minute of downtime?

It can be helpful to grade each database object in terms of its criticality and volatility. This can be accomplished using a grid, like this one.nature and type of data

The vertical axis represents a criticality continuum that ranges from easily replaceable data to data that cannot be easily replaced. The horizontal axis represents a volatility continuum that ranges from static data that changes infrequently to volatile data that changes frequently. Use this grid to diagram each database object by estimating its relative volatility and importance to the organization. Remember, these terms are somewhat vague; you will need to analyze your data and define it along the axes based on your knowledge of the data and your organization.

Once you have charted your database objects, you can use the diagram as a general indicator of how frequently each database object should be backed up. The DBA in charge of each application must develop the backup thresholds for each different type of data, as suggested by the grid. In general, critical data should be backed up more frequently than noncritical data, and volatile data should be backed up more frequently than static data. The key, however, is how you define the term frequently. For example, 1,000 updates per day might be frequent at some shops, whereas 50,000 updates per day might be infrequent at other shops. The DBA uses the grid to determine an appropriate backup schedule for each database object. The method of backup is also affected by user access needs.

Quadrant 1 on the grid identifies the critical/dynamic data in the organization. This data is crucial to your business and it changes rapidly. As such, you must be able to recover it quickly, so you should copy it frequently. As a rule of thumb, the data should be backed up at least on a daily basis. If more than 20% of the data changes daily, be sure to make full rather than incremental backups.

Quadrant 2 represents critical but static data. Even though the data changes little from day to day, you will need to recover the data promptly in the event of an error because it is critical to the business. Be sure to back up this data at least weekly. Consider using incremental backups that are merged immediately upon completion to minimize the work required during recovery.

Quadrant 3 represents volatile data that is not as vital to your business. You may be able to recreate the data if it becomes corrupted. Depending on the amount of data and the volume of change, you might not even back it up at all. For small amounts of data, a printed report may suffice as a backup. If the data fails, you could simply reenter it from the printed report. Alternatively, if data is recreated nightly in a batch job, you could simply run the batch job to refresh the data. As a DBA, you will need to ensure that the data can be recreated or copied on a regular basis. In general, more than a weekly backup for quadrant-3 data is likely to be overkill.

Quadrant 4 represents static, noncritical data. Such data does not change much and can be replaced easily. It is the least important data and should be addressed only when data in the other three quadrants have been adequately backed up. In fact, quadrant-4 data may never need to be backed up—the DBA could take a similar approach to that described for quadrant 3.

Posted in backup & recovery, DBA | Leave a comment

Coronavirus and Tech Conferences

As we all hunker down, social distancing to help reduce the impact of the worldwide COVID-19 pandemic, you may have noticed that many industry conferences are being canceled or postponed. I think that this may be a major inflection point in the history of the tech conference.

First of all, let me state that it is absolutely the correct and responsible thing to either cancel or postpone conferences. It is unwise to continue with events where hundreds, sometimes thousands, of people congregate.

Which events have been impacted? Well, quite a few. Let’s discuss several of the major, data-related events that have been impacted.

First of all, some conferences are eliminating the in-person event and changing to an online model:

  • One of the earliest events to be modified was the IBM Think 2020 event, originally scheduled for the week of May 4-7 in San Francisco. IBM canceled the in-person conference and announced a Digital Event Experience to occur May 5 -6.
  • The Microsoft Build event, which was scheduled to take place in Seattle the week of May 19-21, was modified to an online conference.
  • The Salesforce Connections conference, which was planned for April 4-6 in Chicago is also being transitioned to an online event.

Some conferences are postponing their events, but it remains to be seen what that actually means. Will venues and speakers be available for the re-scheduled timeframe?

Postponed-image-1

Examples of postponed events include:

event-canceled

And others have just outright canceled their events:

  • The huge South by Southwest conference and festival, held every year in Austin, TX which was originally for March 13-22, has been canceled.
  • Industry analyst firm Gartner, has canceled or postponed all of its conferences for April through August.
  • And perhaps most interesting of all, O’Reilly Media is closing down its conference business for good. That signals the end of the annual Strata conference which was one of the only vendor-neutral events focused on data and AI.

It is important to consider the impact of these cancellations and postponements. First of all, we can view it as an opportunity to gauge the effectiveness of online conferences. I think there is a place for them, but from the viewpoint of an attendee, nothing can replicate the experience in terms of learning, engagement, and networking that a live, in-person event can deliver. However, from the perspective of the marketing teams that help to bankroll these events, perhaps they will be able to see the impact of not attending their normal conferences. Of course, the impact on their bottom line will be more than just not having attended conferences, given the wide-ranging impact of the coronavirus pandemic.

Another interesting data point will be the reduced travel budget for organizers, speakers, marketers, and attendees. Perhaps organizations will try to evaluate the ROI of travel when they have some data for at least a time period when travel was restricted.

Nevertheless, the coronavirus is changing the way we operate and some of those changes are probably going to stick. As a regular participant (attendee and speaker) at industry events, I am eagerly looking forward to evaluating the eventual impact of the pandemic on tech conferences. Personally, not only do I hope that they survive, but I fully expect that most of them (with some notable exceptions: Strata) will come back as vibrant as ever.

What do you think?

Posted in conferences, data | 1 Comment

How to Measure Your DBAs

One of the most common questions I get when talking to folks about database administration is how to measure the effectiveness and quality of a DBA staff. This question is not really an easy one to answer, and for a number of reasons. The most important reason is that the role of the DBA is constantly changing, so being able to measure something that is always in flux is challenging. One example of this constant change is that DBAs need to go beyond relational, to manage more than just relational/SQL database systems. But there are more examples, of which I have extensively written about and spoken of.

One popular post of mine, titled How Many DBAs?, discusses the difficulty of determining the appropriate staffing level for a DBA group. Basically, it boils down to the techies usually thinking that more DBAs are needed, and management saying that there are already enough (or, even worse, too many) DBAs on staff. The humorous reply from the DBA manager when asked how many DBAs they need is always the same: “one more, please!”

And that brings me to today’s entry, in which we look at what type of metrics are useful for measuring the DBA’s quality of work. So, what is a good way to manage how effective your DBA group is?

A good DBA has to be a jack of all trades. And each of these “trades” can have multiple metrics for measuring success. For example, a metric suggested by one reader was to measure the number of SQL statements that are processed successfully. But what does “successfully” mean? Does it mean that it is syntactically correct, that the statement returned the correct results, that it returned the correct results in a reasonable time?

And what is a “reasonable” time? Two seconds? One minute? A half an hour? Unless you have established service level agreements (SLAs) it is unfair to measure the DBA on response time. And the DBA must participate in establishing reasonable SLAs (in terms of cost and response time) lest s/he be handed a task that cannot be achieved.

Measuring the number of incidence reports is another oft-cited potential metric. Well, this is fine if it is limited to only true problems that might have been caused by the DBA. But not all database problems are legitimately under the control of the DBA. Should the DBA be held accountable for bugs in the DBMS (caused by the DBMS vendor); or for poor SQL written by developers when no time is given to the DBA for review; or for design elements forced on him or her by an overzealous development team (happens all the time).

I like the idea of using an availability metric, but it should be tempered against your specific environment and your organization’s up-time requirements. In other words, what is the availability required? Once again, back to SLAs. And the DBA should not be judged harshly for not achieving availability if the DBMS does not deliver the possibility of availability (e.g. online reorg and change management) or the organization does not purchase reasonable availability solutions from a third party vendor. Many times the DBA is hired well after the DBMS has been selected. Should the DBA be held accountable for deficiencies in the DBMS itself if he or she had no input at all into the DBMS purchase decision?

And what about those DBA tools that can turn downtime into up-time and ease administrative tasks? Well, most DBAs want all of these tools they can get their hands on. But if the organization has no (or little) budget, then the tools will not be bought. And should the DBA be held responsible for downtime when he is not given the proper tools to manage the problem?

OK then, what about a metric based on response to problems? This metric would not necessarily mean that the problem was resolved, but that the DBA has responded to the “complaining” entity and is working on a resolution. Such a metric would lean toward treating database administration as a service or help desk type of function. This sounds more reasonable, at least from the perspective of the DBA, but I actually think this is much too narrow a metric for measuring DBAs.

Any fair DBA evaluation metric must be developed with an understanding of the environment in which the DBA works. This requires an in-depth analysis of things like:

  • number of applications that must be supported,
  • number of databases and size of those databases,
  • number of database servers,
  • types of database systems (pre-relational, relational, NoSQL, etc.),
  • use of the databases (OLTP, OLAP, web-enabled, analytics, ad hoc, etc.),
  • number of different DBMSs (that is, Oracle, Db2, Sybase, MySQL, IMS, etc.),
  • number of OS platforms to be supported (Windows, UNIX, Linux, z/OS, iSeries, etc.),
  • on-premises versus cloud implementations and workloads,
  • special consideration for ERP applications due to their non-standard DBMS usage,
  • number of users and number of concurrent users,
  • type of Service Level Agreements in effect or planned,
  • availability required (24/7 or something less),
  • the impact of database downtime on the business ($$$),
  • backup and recovery requirements including disaster planning and the availability of recovery time objectives (RTOs),
  • performance requirements (subsecond or longer – gets back to the SLA issue),
  • type of applications (mission-critical vs. non-mission-critical),
  • frequency of change requests.

This is probably an incomplete list, but it accurately represents the complexity and challenges faced by DBAs on a daily basis. Of course, the best way to measure DBA effectiveness is to judge the quality of all the tasks that they perform. But many aspects of such measurements will be subjective. Keep in mind that a DBA performs many tasks to ensure that the organization’s data and databases are useful, useable, available, and correct. These tasks include data modeling, logical and physical database design, database change management, performance monitoring and tuning, assuring availability, authorizing security, backup and recovery, ensuring data integrity, and, really, anything that interfaces with the company’s databases. Developing a consistent metric for measuring these tasks in a non-subjective way is challenging.

You’ll probably need to come up with a complex formula encompassing all of the above –and more — to do the job correctly. And that is probably why I’ve never seen a fair, non-subjective, metric-based measurement program put together for DBAs. If any of my readers have a measurement program that they think works well, I’d like to hear the details of the program — and how it has been accepted by the DBA group and management.

Posted in DBA | Leave a comment

Database Schema Change and DevOps

 

Traditionally, the DBA is the custodian of database changes. The DBA is the information technician responsible for ensuring the ongoing operational functionality and efficiency of an organization’s databases and the applications that access those databases.

However, the DBA is not usually the one to request a change; a programmer or user typically does that. There are times, though, when the DBA will request changes, for example, to address performance issues or to utilize new features or technologies. At any rate, regardless of who requests the change, the DBA must be involved in the change process to ensure that each change is performed successfully and with no impact on the rest of the database.

In an organization that has embraced DevOps, a shift occurs that places more of the responsibility for database change on the developer. However, the DBA still must be involved to oversee, analyze, and approve any changes. As with all things in the world of DevOps, it is desirable to automate as much of the process to remove manual, error-prone tasks and increase the speed of delivery. But without a tool that automates complex database changes and integrates into the DevOps toolchain, incorporating database changes into application delivery and deployment remains a slow, mostly manual process.

To effectively make database changes, the DBA needs to consider multiple issues, the most important of which are the appropriateness of the change in terms of the database design and the impact of the change on all other database objects and applications. Additionally, the DBA must determine if the change conforms to standards (for your shop and the industry), how best to make the change, and the timing of the change in terms of its impact on database availability while the change is being made.

The ideal arrangement (in a DevOps shop) is for database schema changes to be incorporated into the DevOps toolchain using a tool that allows developers to request changes. Those changes should be analyzed and compared against standards and rules for conformance. Non-compliant changes should automatically be referred back to the developer for modification and resubmission. Compliant changes should be accepted and cause a script to be generated using the most appropriate mechanisms to implement the change. This is a non-trivial activity which if done properly can eliminate a lot of manual downtime. The generated script should be presented to the DBA for review and upon acceptance, be implemented.

It is worth mentioning here that today’s major DBMS products do not support fast and efficient database structure changes for all types of change. Each DBMS provides differing levels of support for making changes to its databases, but none easily supports every type of change that might be required. One quick example: try to add a column to the middle of an existing row. To accomplish such a task, the DBA must drop the table and recreate it with the new column in the middle. But what about the data? When the table is dropped, the data is deleted unless the DBA was wise enough to first unload the data. But what about the indexes on the table? Well, they too are dropped when the table is dropped, so unless the DBA knows this and recreates the indexes too, performance will suffer. The same is true for database security: When the table is dropped, all security for the table is also dropped. And this is but one example of what seems like a simple change becoming difficult to implement and manage.

Adding to the difficulty of making schema changes is the fact that most organizations have at least two, and sometimes more, copies of each database. There may be copies of the database at different locations or for different divisions of the company. And at the very least, a test and a production version will exist. But there may be multiple testing environments—for example, to support simultaneous development, quality assurance, unit testing, and integration testing. Each database change will need to be made to each of these copies, as well as, eventually, to the production copy. So, you can see how database change can quickly monopolize a DBA’s time.

The bottom line is that a robust, time-tested process that is designed to automate and enable database changes — with DBA oversight — is required. Do not minimize or discount the importance of database schema change management when planning and implementing DevOps at your organization.

 

Posted in DBA, DBMS, DevOps, standards | Leave a comment

Inside the Data Reading Room – Some Books to Start the New Year (2020)

Regular readers of my blog know that I periodically post quick reviews of the technology books that I have been reading. With that in mind, here is the first set of book reviews for 2020, taking a look at three recent books on topics that are in the news: AI, blockchain, and Big Data.

First up, we have The AI Delusion by Gary Smith. This is an interesting book on AI that is easy to read quickly. It is not going to give you any in-depth AI algorithms or models, but will help you to cut through the hype. Because right now, let’s face it, everything and everybody is touting how AI will change the world. And it probably will, but not with computers that think and behave like humans. This book helps you to understand why this is so – that computers lack “understanding.”

If you are at all interested in AI and its potential impact on humanity, you will enjoy reading Smith’s The AI Delusion .

 

Blockchain is another hyped, but useful technology, and Paul Tatro’s Blockchain Unchained: An Illustrated Guide to Understanding Blockchain can help you understand the ins and outs of what blockchain is and how it works. Indeed, if you are looking for a great, readable introduction to blockchain technology then look no further. Tatro’s Blockchain Unchained is in-depth, while at the same time easy to comprehend and digest. This is important for a topic that is potentially as confusing as the blockchain.

In 146 pages and just over a hundred figures, Tatro manages to instruct even beginners on the technology and capabilities of blockchain. Consider reading Blockchain Unchained to give yourself a firm blockchain foundation so you can understand its capabilities.

 

And finally, we have Building Big Data Applications by world-reknowned expert on all-things-data, Krish Krishnan. Although there are many books on the market that tackle the subject area of big data, Krishnan’s new tome on the topic, Building Big Data Applications, provides readers with a succinct overview of big data infrastructure, development, and technologies for designing and creating applications.

Building Big Data Applications offers up guidance for beginners, but one of the nice features of the book is the well-curated lists of resources for additional reading and discovery at the end of each chapter. The book also offers up nice examples of big data use cases that can be helpful to developers with similar requirements.

 

Have any new books that you would like to have reviewed here? Drop me a note to let me know about them…

Posted in AI, analytics, Big Data, blockchain, book review, books, data | Leave a comment

The DBA Corner Column

Today’s post is to remind my readers that I write a monthly column for Database Trends & Applications magazine on database and DBA issues and trends by the name of DBA Corner. And I have been writing this column for over 20 years now!  A complete history of the column can always be found on my web site.

The DBA Corner column focuses on issues of interest to data architects, database analysts and database administrators. Issues addressed by the column include data modeling and database design, database implementation, DBA practices and procedures, performance management and tuning, application development, optimization techniques, data governance, regulatory compliance with regard to data, and industry trends.

The first DBA Corner column of 2020 was published last week on the topic of SQL Performance Testing and DevOps.

In 2019, I wrote about the follow topics in the DBA Corner column:

So be sure to check the DBTA.com web site every month for each new edition of DBA Corner!

 

And drop me a note here as a comment if you have any topics you would like to see me cover in future DBA Corner columns.

Posted in DBA | Leave a comment

Happy New Year 2020!

Hello everybody… and welcome to a new year. Just a quick post today to wish everybody out there a very Happy New Year!

 

new-years-day-2013-5

Whether you rang in the year with drinks, dinner, a party, or watching the year roll in on television, I wish for you all the happiness and success that a new beginning (like the start of a new year) can bring.

And be sure to keep on coming back to this blog, Data and Technology Today, for news, advices, and my ruminations on all things data.

Posted in DBA, Happy New Year | Leave a comment

Happy New Year

Welcome to 2020 everybody! Just a quick post to wish you all a very happy and prosperous new year.

I hope that your holiday season was enjoyable and relaxing and that you are ready to get back to work soon so that we can all make 2020 a great year for database systems and data management!

Cheers!

2020

Posted in DBA, Happy New Year | Leave a comment

Happy Holidays

Just a quick post to wish all of my readers and followers a very happy holiday season. To you and yours, wherever you are and however you celebrate, may you have the best of times enjoying the holidays!

And I’ll see you all again after the start of the New Year.

happy-holidays

Cheers!

Posted in DBA | Leave a comment