3 Types of Failover in Hyper-V Replica

What Are the Types Of Failover Operations in Hyper-V Replica?

Hyper-V Replica is designed to allow businesses to continue operations after disasters occur, and more impressively, move services to the disaster recovery (DR) site before a forecasted event happens. No business continuity plan (BCP) is considered reliable unless it is tested. Hyper-V Replica provides the ability to test the IT side of the BCP by performing a test failover to the secondary (or DR) site without affecting services that are provided in the primary (or production) site. Today I will explain the three types of failover in Hyper-V Replica and when to use them.

1. Planned Failover

Certain disasters are predicted. Sometimes the weather forecast gives us a few hours notice, and sometimes we even get a week to prepare. Hurricane Sandy formed in the western Caribbean Sea on October 22, and didn’t reach the southeastern United States until October 25. The governors of North Carolina and Virginia declared a state of emergency on October 26. Other states in the northeast quickly followed as they realized this storm was going to be bad.
This is exactly the sort of situation in which you implement a Hyper-V Replica planned failover – when you know a disaster is coming and you want to avoid extended disruption to services, or worse, you want to avoid your business being terminated. Once a decision to evacuate is made, you initiate the below process, and within minutes you can get out of harms way.
What happens during a planned failover:

  1. You shut down the required virtual machines and then start the process in the production site.
  2. The Hyper-V Replica logs are flushed to the secondary site hosts. This is why the virtual machines are shut down.
  3. The Replica virtual machines are updated with the final changes from the Hyper-V Replica logs. This means you have achieved a recovery point objective (RPO) of zero seconds, and there is no data loss.
  4. The virtual machines are powered up in the secondary site. This takes a few minutes, meaning that the recovery time objective (RTO) is small.
  5. Hyper-V Replica automatically reverses replication from the secondary site to the primary site. This requires the host/cluster and the firewalls in the primary site to accept inbound replication from the host/cluster in the secondary site. The benefit of this is that you can easily restore services to the primary site by performing another planned failover from the secondary site.

Hyper-V Replica: Planned Failover

Planned Failover in Hyper-V Replica.

I deliberately chose Hurricane Sandy as an example. There is a story of how two early adopters of Windows Server 2012 Hyper-V survived the disaster because of Hyper-V Replica.
An alternative use of a planned failover is what I like to think of as a “stretch Quick Migration.” In this situation, there is no disaster; you just want to move VMs to another location with minimal downtime. This could allow you to phase out a branch office computer room or even migrate VMs to a service provider’s public cloud or hosted private cloud.

2. Unplanned Failover

Every IT person dreads a phone call at 3:00 AM because it’s never good news, unless you have an amazing overtime plan. Overtime! There’s no overtime in IT! Usually it’s because a server is down or an executive on the other side of the planet cannot log in. But it could be because the data center or a branch office has burned down or been flattened by an earthquake. The business needs to recover services immediately.
The correct Hyper-V Replica response here is to perform an unplanned failover. The clue is in the name: the disaster was unexpected. The requirement here is that the primary site VMs are no longer running. This avoids potential corruption. You perform the unplanned failover using the replica VMs in the secondary site. The VMs boot up using the last asynchronous replication with a maximum data loss (RPO) of five minutes.
Replication is not automatically reversed. It’s hard to reverse replication to an office that is now a pile of rubble or smoldering ashes. However, once a new production site is built, you can configure replication to the new host/cluster and perform a planned failover when you are ready.

Hyper-V Replica: Unplanned Failover

Unplanned Failover in Hyper-V Replica.

3. Test Failover

The people, processes, and technology of a BCP must be tested regularly, but you cannot bring down production systems once a quarter, half year, or even annually to do this – the cost of testing a BCP that might never be implemented could be more expensive than the disaster! Hyper-V Replica allows you to test the technology element of a BCP by performing a test failover on the replica VMs in the secondary site:

  1. A test virtual switch is configured for each replica VM.
  2. A diskless clone of the replica virtual machine is created.
  3. A differential disk is created in the test VM for every disk that is in the replica VM. The differential disks use the replica VMs as parents.
  4. The test VM powers up on the isolated virtual switch, thus avoiding confusion on the production network.
  5. Tests are done and the replica VM is destroyed.

This process, done only in the secondary site, of using cloned VMs with differential disks linked to the replica virtual machine’s disks means that the test VM is created nearly instantly. There is no waiting to perform the test. The space consumed by the test is minimized. You can perform the test without impacting either:

  • Production systems: The test VM is on an isolate virtual switch.
  • Replication: The test is done with a linked clone, rather than with the replica VM. Any changes made to the test VM do not impact the replica VM or continuing replication.

Hyper-V Replica: Test Failover 

Test Failover in Hyper-V Replica.
Another use of the test failover is to create an isolated clone of a production VM for alternative testing. Maybe an administrator wants to test the upgrade of an operating system or application as well as the rollback for a change control process? Or maybe a tester or developer wants to troubleshoot an issue in a production system with production data without impacting services? All this can be safely done in the sandbox that is created by a Hyper-V Replica test failover.