When long-term data retention is imposed on your data — anything more than a couple of years — then archiving becomes the only valid solution to being in compliance with such requirements. This is so because without a database archiving solution, the volume of data can impose hardships on operational performance. With enough data, it is possible that you may exceed the ability of existing DBMS technology to store such large volumes of data… and it is difficult to maintain authenticity over changing database schemas.
OK, so if we are archiving our database data into a separate archive data store, we will need to integrate our archive policies with our backup schedule. This is mandatory to ensure that database recoveries, which are unscheduled by definition, can be performed properly and effectively.
Remember that Database Archiving is the process of removing selected data records from operational databases that are not expected to be referenced again and storing them in an archive data store where they can be retrieved if needed. It is very important to keep in mind that when data is archived from the operational database, that data is no longer in the operational database.
So far so good. But now let’s think about some potential recovery scenarios. During a weekly batch cycle jobs are run that erroneously modify data in the operational database. But also, that same day, a database archive policy was run that caused data to be removed from the operational database. If we are not careful, our recovery could reverse not only the changes that were made in error, but also the data that was deleted and moved to the archive data store. In such a case, the data would then exist in both the operational database and the archive data store. The next time we execute the archive policy for that database we would archive the data again, which would result in duplicate data in the archive.
To avoid this type of situation DBAs must have knowledge of when database archive policies are run. The DBA would need to either ensure that the recovery does not apply the data deleted from the archive, or alternately, discard the data from the archive data store immediately after the recovery that places it back in the operational database. The former solution is probably better than the latter.
Of course, much depends on the complexity of the database recovery. Consider the following situation:
- Process1 run and makes erroneous changes,
- the ArchivePolicy runs, and
- then Process2 run also making erroneous changes.
If you recover the database back to the point-in-time before Process1 you will include the archived data in your recovery. If you have a log-based recovery scenario that can apply application recovery by reversing the impact of a process, then you could run it for both processes and avoid re-applying the archived data.
There are innumerable recovery scenarios a DBA could be faced with, so it is impractical (if not impossible) to come up with a step-by-step guide to database backup and recovery when archiving is involved.
Suffice it to say, you will need to integrate knowledge of, and policies for, recovering your databases in conjunction with your database archives.