Inside the Data Reading Room – Fall 2014 Edition

Welcome to yet another edition of Inside the Data Reading Room, a regular feature of my blog where I take a look at recent database- and data-related books. In today’s post we’ll examine a book on SQL queries and a book on temporal databases.

The first book is the third edition of SQL Queries for Mere Mortals by Michael J. Hernandez and John L. Viescas. If you have read either of the first two editions of this book you know how useful it can be to help a new to intermediate SQL coder learn the ins and outs of the language.

The book is designed to be used by people who are not SQL wizards. That is to say, if you can already code circles around your peers and you rarely look things up in the manual, then you are not likely to get much from SQL Queries for Mere Mortals. But if you are one of those “mere mortals” referenced in the title, this book will be quite helpful.

Part 1 of the book consists of the first three chapters and it sets the stage for the meat of the book by explaining things like relational theory, database 101, design, and a brief history of SQL. This material can safely be skipped if you feel you have the necessary background.

The rest of the book digs into the task of teaching SQL: from the basics in Part 2, through joining, grouping and aggregating data, modifying data, and more. If you have a database environment you can work with, then working through the exercises provided in each chapter will help you to hone your SQL coding skills.

Now it is true that there are a lot of SQL books that have been published… and it can be difficult to choose which ones you want to buy. I think the authors have done a nice job of servicing a well-defined subset of the market – the non-expert SQL coder looking to learn more. If you think that describes you, then you would do well to add this book to your reading list.

The next book we’ll discuss here is Time and Relational Theory: Temporal Databases in the Relational Model and SQL by C.J. Date, Hugh Darwen and Nikos A. Lorentzos. For long-time database professionals, Date and Darwen need no introduction. If you are looking for an expert to define database theory to you, then Date and Darwen should be at the top of your list.

Temporal database technology is gaining acceptance and being adopted in commercial database management systems, such as IBM’s DB2. As such, learning the theory behind temporal data is a worthwhile endeavor for DBAs and developers alike.

Although listed as a second edition, Time and Relational Theory is based on the author’s earlier book on the subject, Temporal Data & the Relational Model (Morgan Kaufmann, 2002), with additional new material and research. So it is more a new book than a second edition. The book covers the temporal extensions that have been added in SQL:2011, as well as offering insight into temporal functionality still missing from SQL.

So, if you are looking for a good introduction to temporal relational database functionality, or an adjunct to the explanation of temporal capabilities in your DBMS manuals, look no further than Time and Relational Theory from Date, Darwen and Lorentzos.

Posted in DBA | Tagged , , , | Leave a comment

More DBA Rules of Thumb

A couple of years ago I posted a set of DBA Rules of Thumb (ROT) here on the blog. Over time, though, I have thought more about it and collected additional ROTs that I’d like to share with you today.

The first new ROT is to Diversify!  A good DBA is a Jack-of-All-Trades. You can’t just master one thing and be successful in this day-and-age. The DBA maintains production, QA and test environments, monitors application development projects, attends strategy and design meetings, selects and evaluates new products, and connects legacy systems to the Web.

And if all of that is not enough, to add to the chaos, DBAs are expected to know everything about everything. From technical and business jargon to the latest management and technology fads and trends, the DBA is expected to be “in the know.” For example, the DBA must be up on trends like the Big Data and Analytics movement.

And do not expect any private time: A DBA must be prepared for interruptions at any time to answer any type of question… and not just about databases, either.

When application problems occur, the database environment is frequently the first thing blamed. The database is “guilty until proven innocent.” And the DBA is expected to be there to fix things. That means the DBA is often forced to prove that the database is not the source of the problem. The DBA must know enough about all aspects of IT to track down errors and exonerate the DBMS and database structures he has designed. So he must be an expert in database technology, but also have semi-expert knowledge of the IT components with which the DBMS interacts: application programming languages, operating systems, network protocols and products, transaction processors, every type of computer hardware imaginable, and more. The need to understand such diverse elements makes the DBA a very valuable resource. It also makes the job interesting and challenging.

To summarize, the DBA must be a Jack-of-all-Trades and a master of some!

The second new ROT is to Invest in Yourself.  Aren’t you worth investing in?

Most every IT professional continually looks for their company to invest money in on-going education. Who among us does not want to learn something new – on company time – and with the company’s money? Unless, perhaps, you are self-employed!

Yes, your company should invest some funds to train you on new technology and new capabilities – especially if it is asking you to do new things. And since technology changes so fast, most everyone has to learn something new at some point every year. But the entire burden of learning should not be placed on your company!

Budget some of your own money to invest in your career. After all, you probably won’t be working for the same company your entire career. Why should your company be forced to bankroll your entire ongoing education? Now, I know, a lot depends on your particular circumstances. Sometimes we accept a lower salary than we think we are worth because of the “perks” that are offered. And one of those perks can be training.

But some folks simply abhor spending any of their hard-earned dollars to help advance their careers. Shelling out a couple of bucks to buy some new books, subscribe to a publication, or join a professional organization shouldn’t be out of the reach of most folks in IT, though.

A willingness to spend some money to stay abreast of technology is a trait that should apply to DBAs. Most DBAs I’ve known are insatiably curious and many are willing to invest some of their money to learn something new. Maybe they bought that book on XML before anyone at their company started using it. Perhaps it is just that enviable bookshelf full of useful database books in their cubicle. Or maybe they paid that nominal fee to subscribe to the members-only content of that SQL Server portal. They could even have forked over the $10 fee to attend the local user group.

Don’t get me wrong. I’m not saying that companies should not reimburse for such expenses. They should – it provides for better-rounded, more educated, and more useful employees. But if your employer won’t pay for something that you think will help your career, then why not just buy it yourself?

And be sure to keep a record of such purchases because unreimbursed business expenses can be tax-deductible.

What do you think? Can you come up with any more Rules of Thumbs for those of us in the world of Database Administration?

Posted in DBA | Tagged , | 4 Comments

Managing Application Performance… from a database perspective.

Applications that access databases are only as good as the performance they achieve. And every user wants their software to run as fast as possible. As such, performance tuning and management is one of the biggest demands on the DBA’s time. When asked what is the single most important or stressful aspect of their job, DBAs typically respond “assuring optimal performance.” Indeed, analyst surveys regularly indicate that “performance and troubleshooting” tops the list of most challenging DBA tasks.

But when you are dealing with data in a database management system there are multiple interacting components that must be managed and tuned to achieve optimal performance. That is, every database application, at its core, requires three components to operate:

  • the system (that is, the DBMS itself, the network, and the O/S),
  • the database (that is, the DDL and database schema), and
  • the application (that is, the SQL and program logic).

Each of these components requires care and attention, but today I want to focus on the high-level aspects of performance management from the perspective of the application.

So where do we begin? A logical starting point is with program parameters, such as the package bind parms. There are many parameters and values that must be chosen from and specified when you bind an application program. The vast array of options at our disposal can render the whole process extremely confusing – especially if you don’t bind on a daily basis. And even if you do, some of the options still might be confusing if you rarely have to change them. I’m talking about parameters like ACQUIRE, RELEASE, VALIDATE, ISOLATION and DEGREE.

I will not delve into the myriad bind options and give you advice on which to use when. There are many articles and books, as well as DBMS manuals that you can use to guide you along that path. Suffice it to say, that there are some standard parameters and values that should be chosen “most of the time.” As such, a wise DBA group will set up canned routines for the programmers to use for compiling and binding their applications. Choices such as: “transaction”, “batch”, or “analytical query” can be presented to the developer and then, based on which of the various types of programs and environments that are available, the canned script can choose the proper bind options. Doing so can greatly diminish the problems that can be encountered when the “wrong” parameters or values are chosen when preparing programs for production.

Before concluding this short section on parameters consider this one important piece of advice: In production, always Bind your programs specifying EXPLAIN YES. This option will externalize the access path choices made by the optimizer at bind time. Failing to do so means that access paths will be generated, but you will not know what they are. This is akin to blinding yourself to what is going on and it makes application performance tuning much more difficult.

Access Path Management

There are several common approaches to access path management. The best approach is to Rebind your applications over time as the data changes. This approach involves some form of regular maintenance that keeps database statistics up to date and formulates new access paths as data volumes and patterns change.

Other approaches include Rebinding only when a new version of the DBMS is installed, or perhaps more ambitious, whenever new maintenance or fixpacks are applied to the DBMS. Yet another approach is to Rebind automatically after a regular period of time, whether it is days, weeks, months, or whatever period of time you deem significant. This approach can work if the period of time is wisely chosen based on the application data – but it still can pose significant administrative issues.

The final approach – the worst of the bunch – comes from the “if it ain’t broke don’t fix it” school of thought. Basically, it boils down to (almost) never Rebinding your programs. This approach penalizes every program for fear that a single program (or two) might experience a degraded access path. Oh, the possibility of degraded performance is real and that is why this approach has been adopted by some. And it can be difficult to find which statements may have degraded after a Rebind. The ideal situation would allow us to review the access path changes before hand to determine if they are better or worse.

Anyway, let’s go back to the best approach again, and that is to Rebind regularly as your data changes. This involves what is known as the three Rs: REORG, RUNSTATS, and Rebind. Your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that the DBMS is accessing data based upon false assumptions.

By Rebinding you will generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. And as you apply changes to the DBMS, optimizer improvements and new access techniques can be used. If you never Rebind, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to the DBMS itself.

To adopt the Three R’s you need to determine when to REORG. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, perhaps we need 4 R’s:





But is this enough? Probably not because we need to review the access paths after rebinding to make sure that there are no rogue access paths. So, let’s add another “r” – to Review the access paths generated by the REBIND. As we mentioned, the optimizer can make mistakes. And, of course, so can you. Users don’t call you when performance is better (or the same). But if performance gets worse, you can bet on getting a call from irate users.

So we need to put in place best practices whereby we test Rebind results to compare the before and after impact of the optimizer’s choices. Only then can we assure that we are achieving optimal application performance.

Tuning the Code

Of course, everything we’ve discussed so far assumes that the code is written efficiently to begin with – and that is a big assumption. We also need to make sure that we are implementing efficient application code. The application code consists of two parts: the SQL code and the host language code in which the SQL is embedded.

SQL is simple to learn and easy to start using. But SQL tuning and optimization is an art that takes years to master. Some general rules of thumb for creating efficient SQL statements include:

  • Let SQL do the work instead of the application program. For example, code an SQL join instead of two cursors and a programmatic join.
  • Simpler is generally better, but complex SQL can be very efficient.
  • Retrieve only the columns required, never more.
  • Retrieve the absolute minimum number of rows by specifying every WHERE clause that is appropriate.
  • When joining tables, always provide join predicates. In other words, avoid Cartesian products.
  • Favor using Stage 1 and Indexable predicates.
  • Avoid sorting if possible by creating indexes for ORDER BY, GROUP BY, and DISTINCT operations.
  • Avoid black boxes – that is, avoid I/O routines that are called by programs instead of using embedded SQL.
  • Avoid deadlocks by updating tables in the same sequence in every program.
  • Issue data modification statements (INSERT, UPDATE, DELETE) as close as possible to the COMMIT statement as possible.
  • Be sure to build a COMMIT strategy into every batch program that changes data. Failing to COMMIT can cause locking problems.

To tune SQL the DBA must be able to interpret the output of the access paths produced by EXPLAIN. This information is encoded in the explain tables, with the primary one usually called PLAN_TABLE. There are vendor tools to simplify this process that render access paths visually and give recommendations on tuning approaches – and these tools are well worth considering.

Finally, some attention must be paid to the host language code. Host language code refers to the application programs written in C, COBOL, Java, or the programming language du jour. SQL statements are usually embedded into host language code and it is quite possible to have finely tuned SQL inside of inefficient host language code. And, of course, that would cause a performance problem.

The Bottom Line

Although DBAs must understand all three aspects of database performance management concentrating on the application aspects of performance will most likely provide the most bang-for-the-buck. Of course, we have only touched the tip of the application performance iceberg today. But even this high-level view into application performance can serve as a nice starting place for tuning your database applications.

Good luck and happy performance tuning!

Posted in performance, SQL | 1 Comment

All or Nothing?

Why is it always “all or nothing” in the IT world?

The prevailing wisdom these days is that PCs are dead, and that everybody will move to smart phones, tablets and the cloud. This is hogwash. First of all, PCs are far from dead. If you need to perform calculations (spreadsheet), write (word processor), or deliver a presentation it is still far easier to use a PC or laptop than it is to try to do any of those things on a smart phone or a tablet. Sure, a lot of folks (me included) use tablets and smart phones to do quick Internet searches, browsing and keeping up with social media. But when I really want to do some research on the web, I use my PC and the Chrome web browser.

I don’t know about you, but I want to be able to download and store information from the web. I want to be able to run applications like word processors and spreadsheets. I want to be able to do my taxes on my computer. And today the PC is still the best platform for doing all of this. There may come a day when these applications are accessible and usable over the web, but not yet… not for me.

If you moved everything to the cloud and relied on applications in the cloud, what would happen to productivity if Internet service gets disconnected? You’d be dead in the water! But with applications installed on my PC I can still get work done… a lot of work done… without the Internet. Heck, if I’m not being interrupted by Twitter and Facebook and e-mail, maybe I’d get more work done without the Internet…  no, that would be adopting an all-or-nothing attitude – and that is what I am complaining about here! I’m not a Luddite and I don’t want to ignore the cloud and personal smart devices, I just don’t want to see my tried-and-true PC platform shuffled off this mortal coil before its time (yes, I’m talking to you Microsoft and your ridiculous Windows 8).

And what about mainframe access? Terminal emulation is usually done via a Windows PC. And don’t let me hear you say that mainframes are dead, either! Many of the world’s largest companies rely on mainframes to keep their business running and gain competitive advantage over their rivals.

So let’s not say things like “everybody” or “always” or “never”… let’s look to be inclusive and to add worthy new technology to our arsenal without the need to necessarily jettison technology that still works and still provides value… OK?

Posted in cloud, mainframe | Leave a comment

An Introduction to Database Design: From Logical to Physical

This is one of the more popular posts on this blog, so I am re-blogging it today… enjoy!!!

Originally posted on Data and Technology Today:

A proper database design cannot be thrown together quickly by novices. What is required is a practiced and formal approach to gathering data requirements and modeling data. This modeling effort requires a formal approach to the discovery and identification of entities and data elements. Data normalization is a big part of data modeling and database design. A normalized data model reduces data redundancy and inconsistencies by ensuring that the data elements are designed appropriately.

From Logical…

So, database design is the process of transforming a logical data model into an actual physical database. Technicians sometimes leap to the physical implementation before producing the model of that implementation. This is unwise. A logical data model is required before you can even begin to design a physical database. And the logical data model grows out of a conceptual data model. And any type of data model begins with the discipline of data…

View original 1,811 more words

Posted in DBA | 2 Comments

On The Importance of Database Backup and Recovery

Things break. When a new database is created or a new application goes online, everything is fresh and new—and running as designed. But the environment changes over time. New hardware and software is added, along with more users, more data, more requirements—more, more, more. Systems wear down as they are taxed on a daily basis to deliver service.

The DBA must be prepared for situations where a failure impacts the availability, integrity, or usability of a database. Reacting to failures and service disruptions is a key component of the DBA’s job. The ability of the DBA to react accordingly depends directly on his having a well-planned approach to database backup and recovery.

I know that many DBAs believe that ensuring optimal database and application performance is the most important task in their list of job responsibilities, but it is not true. These DBAs are confusing frequency with importance.

Most DBAs probably are conducting performance-related tasks more often than building backup plans – and they better be managing performance more frequently than they are actually recovering their databases or their company has big problems! But recoverability should be at (or near) the very top of the DBA task list, definitely before performance. Why? Well, if you cannot recover your databases after a problem then it won’t matter how fast you can access them, will it? Anybody can deliver fast access to the wrong information. It is the job of the DBA to keep the information in our company’s databases accurate, secure, and accessible.

So what do DBAs need to do to assure the availability and accuracy of our database data? This chapter will outline the basics of database backup and recovery and provide guidance on building a robust backup and recovery plan for your databases.

Preparing for Problems

Numerous daily hazards can cause system failures. As you plan your database backup and recovery strategy, be sure to consider all of these various threats to database integrity and availability. Of course, it is wise to take precautionary measures to prevent failures. Techniques such as UPS systems, mirrored disks, and failover technology can minimize the need to recover, but no amount of planning and regulation can prevent unexpected failures.

Database failures that may require recovery can be divided into three categories:

  • Instance failures are the result of an internal exception within the DBMS, an operating system failure, or other software-related database failure. In some cases, an instance failure can result in corruption of data that requires a recovery, but usually such failures do not damage data, so the DBMS simply needs to be restarted to reestablish normal operations.
  • Application (or transaction) failures occur when programs or scripts are run at the wrong time, using the wrong input, or in the wrong order. An application failure usually results in corrupt data that requires a database restore or recovery. The sooner an application failure is identified and corrected, the smaller the amount of damage to the database will be.
  • Media failure is likely to damage data, too. Media failure includes damage to disk storage devices, file system failures, tape degradation or damage, and deleted data files. Although less common in practice, damaged memory chips also can cause data corruption. After a media failure, the database will likely be in a state where valid data is unreadable, invalid data is readable, or referential integrity is violated. Outages due to media failures can often be avoided by implementing modern disk technologies such as RAID.

Businesses today are demanding higher throughput and around-the-clock availability even as they increase the amount of stored and processed data. Not too long ago, we talked about data in terms of gigabytes. Now it is common for organizations to manage a terabyte or more of data on a single database server. Therefore, more data needs to be constantly available and has to be processed faster than ever before. Organizations rely on data to conduct business, so it is imperative that you are prepared with a plan to counteract failures. A sound backup and recovery plan can be thought of as an insurance policy for your data.

Image Copy Backups

A fundamental component of a database backup and recovery plan is creating backup copies of data. When an error occurs that damages the integrity of the database, a backup copy of the data can be used as the basis to recover or restore the database. However, the full story on backing up a database is not quite that simple.

Backing up databases involves making consistent copies of your data, usually in the form of image copies, which are the output of a COPY utility. The name of the copy utility will vary from DBMS to DBMS. Common names for the backup utility include BACKUP, COPY, DUMP, and EXPORT. Some DBMSs rely on the native operating system’s file system commands for backing up data. However, even if the DBMS supplies an internal backup option, the DBA may choose to use facilities that operate outside the realm of the DBMS.

Full vs. Incremental Backups

Two types of image copy backups can be taken: full and incremental. As a DBA, you will need to learn the difference between the two and implement the proper image copy backup strategy based on application needs and database activity.

A full image copy backup is a complete copy of all the data in the database object at the time the image copy was run. An incremental image copy backup, sometimes referred to as a differential backup, contains only the data that has changed since the last full or incremental image copy was made. The advantage of taking an incremental backup rather than a full backup is that it can sometimes be made more quickly, and it occupies less space on disk or tape. The disadvantage is that recovery based on incremental copies can take longer because, in some cases, the same row is updated several times before the last changes are restored.

If the DBMS supports incremental image copy backups, it may also support incremental copy merging. A merge utility, sometimes referred to as MERGE or MERGECOPY, can be used to combine multiple incremental image copy backups into a single incremental copy backup, or to combine a full image copy backup with one or more incremental image copy backups to create a new full backup.

Some DBMSs support making backup copies of indexes. Indeed, some DBMSs require indexes to be backed up, whereas index backup is optional for others. Index backup can be optional because the DBMS can rebuild an index from the table data. Therefore, a typical recovery scenario would involve recovering tables or tablespaces and then using the table data to rebuild the indexes. If the DBMS supports index backups, you can choose to copy indexes and then recover using the image copy backups of the indexes.

As a DBA, though, you will need to examine the trade-offs of copying indexes if your DBMS supports index backup. The question DBAs must answer for each index is “Rebuild or recover?” The more data that must be indexed, the longer an index rebuild will require in a recovery situation. For larger tables, backing up the index can result in a much quicker recovery—although at the expense of the increased time required for backup. When multiple indexes exist on the large table, backing them up, again, leads to faster recovery. However, keep in mind that index backups will require additional time to execute during your regular backup process. As a DBA, you will need to weigh the cost of recovery versus the cost of backup in making your decision.

Backup Consistency

Be sure your backup plan creates a consistent recovery point for the database object. In order to ensure backup consistency, you need to be aware of all relationships between the database objects being backed up and other database objects. This includes application-enforced relationships, referential constraints, and triggers. If you use an image copy backup to recover a database object to a previous point in time, you will need to recover any related database objects to the same point in time. Failure to do so will most likely result in inconsistent data.

If your DBMS provides a QUIESCE utility, use it to establish a point of consistency for all related database objects prior to backing them up. The QUIESCE utility will halt modification requests to the database objects to ensure consistency and record the point of consistency on the database log. Use the QUIESCE utility even when some database objects do not need to be copied, so that you can provide a consistent point of recovery for all referentially tied tablespaces.

If the DBMS does not provide a QUIESCE option, you will need to take other steps to ensure a consistent point for recovery. For example, you can place the database objects into a read-only mode, take the database objects offline, or halt application processes—at least those application processes that update the related database objects.

Document Your Backup Strategy

Once your backup strategy has been established and implemented, the backup system can run for a long time without any DBA intervention required. Such automation is a mixed blessing, though. Over time, things can be forgotten and the DBA staff can change, both of which can cause confusion during a hectic database recovery. For this reason it is imperative that the backup and recovery strategy, implementation, and procedures be thoroughly tested and documented by the DBA.

The most important aspect of any backup plan is to match it against your recovery requirements and options. More on that in the next section.

Database Recovery

When problems impact the database, the DBA can use the image copy backups and the database log to recover the database. Whatever the cause of the problem, the DBA must be able to recover data quickly so that the business can continue to operate. When data is unavailable, your company may be losing thousands or even millions of dollars. Recognizing the need for a database recovery is quite different from actually performing a recovery in a speedy and proper fashion. Database recovery can be a very complex task that is prone to errors and difficult to manage.

Recovery involves much more than simply restoring an image of the data as it appeared at some earlier point in time. A database recovery will involve bringing the data back to its state at (or before) the time of the problem. Often a recovery involves restoring databases and then reapplying the correct changes that occurred to that database, in the correct sequence.

Simply stated, a successful recovery is one where you get the application data to the state you want it—whether that state is how it was last week, yesterday, or just a moment ago. If you planned your backup strategy appropriately, you should be able to recover from just about any type of failure you encounter.

Determining Recovery Options

When a failure occurs, the DBA will need to ascertain whether recovery is required. If recovery is required, you will need to determine what resources (backup copies) are available and how best to perform that recovery.

Of the different types of recovery performed, the first one that usually comes to mind is a recover to current, to handle some sort of disaster. This disaster could be anything from a media failure to a natural disaster destroying a data center. Applications are completely unavailable until the recovery is complete. To successfully recover to current, the recovery process must be able to reset the contents of the database to the way it looked just at (or right before) the point of failure. To recover to current, the recovery process must find a valid, full image copy backup and restore that image copy. Then the recovery will roll forward through the database log, applying all of the database changes.

If the last full image copy is lost or destroyed, it may still be possible to recover if a previous image copy exists. The recovery process could start with the older backup copy, apply any incremental copies, and then roll forward through the archived and active logs. Of course, more database logs will be required in such a case, so the recovery process will take longer.

If no image copy is available as a starting point, it may be possible to recover the database object using just the database log. If the data was loaded and the load process was logged, recovery may be able to proceed simply by applying log records.

Another traditional type of recovery is point-in-time (PIT) recovery, which is usually done to deal with an application-level problem. PIT recovery is sometimes referred to as partial recovery because only part of the existing data will remain after recovery. Recovery to a point in time removes the effects of all transactions that have occurred since that specified point in time.

To perform a PIT recovery, an image copy backup is restored and then changes are applied by rolling forward through the database log (or log backups). However, only the log records up to the specified time are processed. Sometimes the recovery point is specified as an actual date and time; sometimes it is specified using a relative byte address on the database log.

To successfully recover to a point in time, the recovery must be able to reset the contents of the database to the way it looked at a prior consistent point. The key is to retain all of the good changes to the data, while removing all of the “bad.” PIT recovery can be accomplished in one of two ways, depending on the features of the DBMS and the amount of data to be recovered. It could:

  • Restore the image copy by rolling forward through the logs and applying the database changes up to the recovery point, or;
  • Not restore the image copy, instead rolling backward through the logs and removing the database changes that occurred after the recovery point.

If the DBMS supports both types of recovery, the DBA should choose to deploy the one that creates the least downtime. If a significant number of changes need to be removed, then restoring and rolling forward usually results in the least downtime. If the number of changes that must be removed are minimal, then rolling backward through the logs should result in less downtime. If your DBMSs does not support backward log rolling, third-party products may be available that can make it a viable technique.

Transaction recovery is a third type of recovery; it addresses the shortcomings of the traditional types of recovery: downtime and loss of good data. Thus, transaction recovery is an application recovery whereby the effects of specific transactions during a specified timeframe are removed from the database. Third-party software is required to perform a transaction recovery.

Traditional types of recovery, both recovery to current and PIT, recover at the database object level. In direct contrast to this level of granularity, transaction recovery allows a user to recover a specific portion of the database based on user-defined criteria. This can be at a transaction or application program level. In this context, a transaction is defined by the user’s view of the process. This might be the set of panels that comprise a new-hire operation, or the set of jobs that post to the general ledger. The important point is that there may or may not be a correlation between the transactions you are trying to fix and transactions (or units of recovery) in the DBMS.

Once you have identified the transaction to recover, you have three recovery options:

  • PIT recovery. You can try to identify all of the database objects impacted by the application and perform traditional point-in-time recovery to remove the effects of the transactions. You would then manually rerun or reenter work that was valid.
  • UNDO recovery. Remove only the effects of the bad transactions.
  • REDO recovery. Remove all the transactions after a given point in time, and then redo the good transactions only.

PIT recovery was discussed earlier, so we will not cover it again here. The other two forms of transaction recovery are SQL-based application recovery techniques. The basic idea is to read the log and generate the proper SQL statements that can be run to achieve the desired recovery results.

Let’s first examine an UNDO recovery. UNDO recovery is the simplest version of SQL-based transaction recovery because it involves only SQL. To accomplish an UNDO recovery, the database logs must be scanned for the identified transaction and anti-SQL is produced. Anti-SQL reverses the affect of SQL by

  • Converting inserts into deletes
  • Converting deletes into inserts
  • Reversing the values of updates (e.g., UPDATE “A” to “X” becomes UPDATE “X” to “A”)

Once the anti-SQL is generated, it is run using an interactive SQL script to perform an UNDO recovery.

A REDO recovery is a combination of PIT recovery and UNDO recovery, with a twist. Instead of generating SQL for the bad transaction that we want to eliminate, we generate the SQL for the transactions we want to save. Then we do a standard point-in-time recovery to eliminate all the transactions since the recovery point. Finally, we reapply the good transactions captured in the first step.

Unlike the UNDO process, which creates SQL statements designed to back out all of the problem transactions, the REDO process creates SQL statements that are designed to reapply only the valid transactions from a consistent point in time. Since the REDO process does not generate SQL for the problem transactions, performing a recovery and then executing the REDO SQL can restore the database object to a current state that does not include the problem transactions.

So, what is the best recovery strategy? It depends… on things like the amount of data impacted, subsequent modifications, log availability, and so on.

Recovery Testing 

It is important that the DBA thoroughly test each different type of recovery in a test environment. Be sure that you can recover from a media failure, an instance failure, and several types of application failures. Document the type of backup taken for each database object, along with a schedule of when each is backed up. Be sure that all of your databases can be recovered and that all DBAs on-site have firsthand experience at database recovery. The DBA group should schedule periodic evaluations of the backup and recovery plans for every production database.

Every DBA should prepare a recovery plan for each database object and test it frequently. A recovery plan describes the procedures you will use to recover in the event of hardware failure or a local site disaster, such as fire.

To develop your recovery plan:

  • Write all aspects of the recovery plan out in detail, documenting each step.
  • Include all the scripts required to back up and recover each database object.
  • Review the plan with everyone who may be called on to implement it.
  • Include a contact list with names and phone numbers of everyone who may be involved in the recovery.
  • Keep the recovery plan up-to-date by modifying the plan to include every new database object that is created.

Testing your recovery procedures on a regular basis includes running regular recovery tests for individual database objects, databases, and the entire database system. By testing recovery procedures, the DBA assures that the backup and recovery scripts work and that every database object is indeed recoverable. Furthermore, regular recovery testing is on-the-job training for the inevitable, stress-filled production database recovery. A DBA that tests his recovery plan will be more familiar with the tools and scripts needed to perform recovery.

That’s enough for one blog post. Happy backing up and recovery testing… but hopefully no actual recovery needed!

Posted in backup & recovery, DBA | 1 Comment

Take a Big Data and Cloud Computing Quiz

Are you concerned about new technology and industry hype rendering your experience obsolete? Are memes like Big Data and Cloud Computing confusing to you… or are you comfortable that you know what you need to know?

Why not take a short Interactive Master Class quiz on cloud and big data integration?

I got 9 out of 10… but I won’t share with you which one I missed!

You might also want to take another, similar yet different, quiz on Big data analytics tools and best practices.

I got 8 out of 8 on this one…

How did you do?

Posted in Big Data, cloud, data | 1 Comment

On Writing

A question that I get asked frequently is “How do you write so much?”… and “Do you have any advice for folks wanting to write their first article (or blog post, or book, or…)?”

Well, if you want to be a writer the first thing that you have to do is make the time to write every day. Even if it is only for a half hour or so, sit down at the computer or pull up a lawn chair outside with a tablet and pen and just write what you are thinking about. It could be personal or technical; funny or serious.  The bottom line is this: if you are serious about writing, you need to set aside time to do it.

Don’t worry about getting everything perfect as you write down your ideas. You can always edit it later. Capture your immediate thoughts as they come to you.  Then take that beginning essence of an idea and flesh it out over time.

As far as writing your first article goes, it is as simple as this: write what you know. Write about the last project you worked on, the current database you are designing, or that particularly thorny performance problem your team just helped you solve. If you write about your experiences you have a better chance of succeeding than if you decide to write about a topic you must research because you know nothing or little about it.

Once it is written, don’t be shy about submitting it to your favorite publications. Most technical journals and web-based publications are interested in publishing first-hand experiences about solving problems with technology. And it is unlikely that an editor will beat down your door for material – especially if you have never written before. Choose the publication wisely – make sure your article is a good fit by reading the magazine before submitting your article. More articles are rejected because they are not good fits for the publication than any other reason. For example, Computer World is unlikely to publish your manifesto on migrating from one set of DB2 utilities to another… but a more technical publication might… and that is an excellent first blog post for that new blog you’ve been thinking about starting up!

Posted in DBA | 1 Comment

On Becoming a DBA

One of the most common questions I am asked is: How can I become a DBA? The answer, of course, depends a lot on what you are currently doing. Programmers who have developed applications using a database system are usually best-suited to becoming a DBA. They already know some of the trials and tribulations that can occur when accessing a database.

If you are a programmer and you want to become a DBA, you should ask yourself some hard questions before you pursue that path. First of all, are you willing to work additional, sometimes crazy, hours? Yes, I know that many programmers work more than 40 hours already, but the requirements of the DBA job can push people to their limits. It is not uncommon for DBAs to work late into the evening and on weekends; and you better be ready to handle technical calls at 2:00 a.m. when database applications fail.

Additionally, you need to ask yourself if you are insatiably curious. A good DBA must become a jack-of-all-trades. DBAs are expected to know everything about everything — at least in terms of how it works with databases. From technical and business jargon to the latest management and technology fads, the DBA is expected to be “in the know.” And do not expect any private time: A DBA must be prepared for interruptions at any time to answer any type of question — and not just about databases, either.

And how are your people skills? The DBA, often respected as a database guru, is just as frequently criticized as a curmudgeon with vast technical knowledge but limited people skills. Just about every database programmer has his or her favorite DBA story. You know, those anecdotes that begin with “I had a problem…” and end with “and then he told me to stop bothering him and read the manual.” DBAs simply do not have a “warm and fuzzy” image. However, this perception probably has more to do with the nature and scope of the job than with anything else. The DBMS spans the enterprise, effectively placing the DBA on call for the applications of the entire organization. As such, you will interact with many different people and take on many different roles. To be successful, you will need an easy-going and somewhat amiable manner.

Finally, you should ask yourself how adaptable you are. A day in the life of a DBA is usually quite hectic. The DBA maintains production and test environments, monitors active application development projects, attends strategy and design meetings, selects and evaluates new products and connects legacy systems to the Web. And, of course: Joe in Accounting just resubmitted that query from hell that’s bringing the system to a halt. Can you do something about that? All of this can occur within a single workday. You must be able to embrace the chaos to succeed as a DBA.

Of course, you need to be organized and capable of succinct planning, too. Being able to plan for changes and implement new functionality is a key component of database administration. And although this may seem to clash with the need to be flexible and adaptable, it doesn’t really. Not once you get used to it.

So, if you want to become a DBA you should already have some experience with the DBMS, be willing to work long and crazy hours, have excellent communication and people skills, be adaptable and excel at organization. If that sounds like fun, you’ll probably make a good DBA.

Posted in DBA | 2 Comments