Disaster Recovery: Practice Makes Perfect
There’s no doubt that having a disaster recovery (DR) plan in place is essential for businesses of all shapes and sizes. However, having a DR plan in place and effectively using it are very different things. Just like application code, DR plans and the DR procedures that you have in place need to be tested to ensure that they actually work like they are intended to. The worst time to find out that your DR plan doesn’t work is when you really need to use it.
Testing your DR plans isn’t just something that you can do once and then forget about it. Today’s business environments are continually changing and their DR requirements change right along with them. To really have confidence that your DR plans will work when you need them you need to be continually revising and testing your DR procedures. There is a definite correlation between practicing your recovery processes and how well they will perform if there is an actual emergency.
Unfortunately, testing DR plans can be quite difficult. For anything but the smallest organization, DR plans are complex with many interrelated components which makes regular testing difficult. Yet at the same time, that complexity makes testing even more important because it means there’s more that can go wrong. The more often an organization tests its DR plans the better the odds that those DR plans will work when they are needed. The proficiency of using your DR plans really depends on the organization. Some companies I know of are extremely good at this. Some organizations actually have a regular practice of periodically switching all of their production processes over to their DR infrastructure. Then at some predefined interval, they switch all of those processes back to their original sites and systems. This level of DR testing ensures that their DR processes are up-to-date and functional. If you truly require the highest levels of availability then this type of practice can ensure that your IT infrastructure can deliver the availability you need and respond to real disasters.
Say Goodbye to Traditional PC Lifecycle Management
Traditional IT tools, including Microsoft SCCM, Ghost Solution Suite, and KACE, often require considerable custom configurations by T3 technicians (an expensive and often elusive IT resource) to enable management of a hybrid onsite + remote workforce. In many cases, even with the best resources, organizations are finding that these on-premise tools simply cannot support remote endpoints consistently and reliably due to infrastructure limitations.
However, while this level of practice is great it is also out of reach for the majority of businesses. Most businesses simply do not have the time or resources to pull off this level of DR testing. Even if you can’t actually make scheduled switches to your DR infrastructure you should still regularly test your DR procedures. Many businesses perform compartmentalized partial tests of their DR plans. For instance, at a minimum, it’s vital that you perform regular tests on the backups and VM replicas that are a part of your DR plans.
Some backup and replication products can help to make this task much easier by automatically testing backups and VM replicas and then reporting the success or failure of the operations. Periodically testing the different technical components of your DR plan is certainly better than no testing at all but that’s not where your DR practice should stop. You should also test the various communication channels that are needed and the documentation that outlines the recovery steps. The documentation should reside offline in the cloud or be printed in a binder in case of some type of network/file share outage occurs.
As the saying goes, practice makes perfect and regular practice of your DR plans will certainly help ensure that they work when your need them. Some DR products can help your make your DR practice easier by automating the testing of your backups and replicas. It’s important to remember that DR practice also includes keeping and testing DR communications as well as documentation and runbooks. You don’t want to be in the situation where your systems are down and you need to figure things out on the spot. If there is an outage and every second counts, documented procedures and known communication channels will help to minimize downtime. For an effective execution of your DR plans, practice makes perfect.