A Cold Spell in Texas Makes Me Think About Contingency Planning and the DBMS

Whenever there is an event in the news like the winter storm that wreaked havoc on Texas last week, it makes me think about contingency planning and things like database disaster recovery. The paltry amount of snow we got in Texas probably makes folks “Up North” chuckle, but it was a real problem because Texas almost never gets snow nor does it get as cold for as many days in a row as it did last week. So my first thought is that what qualifies as a “disaster” will differ based on your location and circumstances.

Anyway, when a “disaster” like this hits it is a good time to review your disaster contingency plans… well, before the disaster would have been better, but being human, the wake of a disaster always causes awareness to rise, so let’s discuss disaster recovery in terms of databases.

A disaster recovery plan is like insurance — you’re glad you have it, but you hope you don’t need it. With insurance, you pay a regular fee so that you are covered if you have an accident. A disaster recovery plan is similar because you pay to implement your disaster recovery plan by designating a disaster recovery site, shipping backup copies of the data off-site, preparing recovery jobs, and practicing the recovery procedures.

Database disaster recovery must be an integral component of your overall business recovery plan. A disaster recovery plan must be global in scope. It must handle business issues such as alternate locations for conducting business, communication methods to inform employees of new locations and procedures, and publicity measures to inform customers how to transact business with the company post-disaster. A component of that plan must be the overall plan for resuming data processing and IT operations. And finally, a component of that plan is the resumption of DBMS operations.

In order for your database disaster recovery plan to be effective, you will need to develop and adhere to a written plan. This plan must document all of the routine precautionary measures required to assure the recoverability of your critical data in the event a disaster occurs. Image copy backups or disk backups need to be made as directed and sent to the remote site as quickly as possible. Reports need to be printed and sent off-site. Missing any little detail can render a disaster recovery plan ineffective.

When practicing the disaster recovery plan, make sure that each team member follows the written instruction precisely. Of course, it is quite liklely that things will come up during the practice sessions that were missed or undocumented in the plan. Be sure to capture all of these events and update the written plan after the disaster recovery test. Keep in mind that during an actual disaster you may need to rely on less experienced people, or perhaps consultants and others who are not regular employees. The more failproof the written plan can be the better the chance for a successful disaster recovery will be.

Your disaster recovery procedures will be determined in large part by the method you use to back up your data. If you rely on pack backups, then your recovery will be one disk volume at a time. If you create database image copies, you will probably use the DBMS’s recover utility or a third party recover tool. Of course, you might combine several different techniques for off-site backups depending on the sensitivity and criticality of the data.

The following tips can be helpful as you develop or review your database contingency plans:

Order of Recovery

Make sure the operating system and DBMS are installed at the correct version and maintenance level before proceeding with any database object recovery at the disaster site. Be sure to follow the recovery steps rigorously as documented in the written plan.

Data Latency

How old is the data? If you take nightly backup tapes to another location, your data could be up to 24 hours old. Sometimes having data that old is unacceptable, but sending backup media to off-site storage more than once a day is too expensive. One solution is to get the data to another location digitally—via log shipping or replication, for example. Database logs at the time of the disaster may not be available to apply at the off-site recovery location. Some data may not be fully recoverable and there is really no way around this. The quicker backup copies of database objects and database logs are sent off-site, the better the disaster recovery will be in terms of data currency.

Remember Other Vital Data

Creating offsite backups for database objects may not be sufficient to ensure a complete disaster recovery plan for each application. Be sure to back up related data and send it offsite as well. Additional data and files to consider backing up for the remote site include DDL libraries for database objects, recovery and test scripts, application program source and executable files, stored procedure program source and executable files, user-defined function source and executable files, libraries and passwords for critical third party DBA tools, and other related data files used by the application.

Beware of Compression

If your site uses tape-compression software, be sure that the remote recovery site uses the same tape-compression software. If it does not the image copy backups will not be readable at the remote site. Turn off compression at the primary site for the disaster recovery image copy backups if the remote site cannot read compressed tape files.

Post-Recovery Image Copies

Part of the disaster recovery process should be to create an image copy backup for each database object after it has been recovered at the remote site. Doing enables easier recoverability of the data should an error occur after processing begins at the remtoe site. Without the new image copy backups, the disaster reocvery procedure would have to be performed again if an error occurs after remote site processing begins.

Disaster Prevention

DBAs and IT professionals in general create procedures and enforce policies. Many of these procedures and policies, such as a disaster recovery plan, are geared toward dealing with errors once they occur. Having such procedures and policies is wise. But it is just as wise to establish procedures and policies to prevent problems in the first place. Although you cannot implement procedures to stop an earthquake or flood, you can implement policies to help avoid man-made disasters. For example, enforce frequent password changes to mitigate data loss due to malicious hackers.

Another good idea is to document and diseminate procedures to end users teaching them how to deal with error messages. For example, you cannot expect every user to understand the impact of responding to every error message. Guidelines can help avoid errors – and man-made disasters.

Summary

Only with comprehensive up-front planning, regular testing, and diligent maintenance will a disaster recovery plan be useful. Be sure you have one for your site… or your disaster recovery plan might become a two-step process in the event of a disaster:

1) Update resume

2) Go job-hunting!

About craig@craigsmullins.com

I'm a data management strategist, researcher, and consultant with over three decades of experience in all facets of database systems development and implementation.
This entry was posted in backup & recovery, contingency planning. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.