In my previous article, “Understanding Disaster Recovery,” I talked about the concepts of disaster recovery (DR) and business continuity planning (BCP). In this article I will move from concepts to actual software features by explaining Microsoft’s disaster recovery (DR) solution for their virtualization platform, Hyper-V Replica (HVR). It’s probably fair to say that Microsoft underestimated just how popular this free DR solution would be. They knew small-to-medium enterprises (SMEs) would be interested, but the attention that HVR received from larger enterprises was a pleasant surprise. In this article, I will give an overview of Hyper-V Replica and how it works. I’ll also post some how-to KB articles at the end.
HVR is a feature that is built into Hyper-V and does not require any additional licensing. The feature was first introduced in Windows Server 2012 (WS2012) and was enhanced in Windows Server 2012 R2 (WS2012 R2). HVR takes advantage of the fact that the easiest way to replicate complex applications to a DR site is to abstract them as virtual machines, and to replicate the virtual machines, which are just a few files. And this follows Microsoft’s drive to get people to virtualize more of their systems (see the enhanced scalability of Hyper-V in WS2012 and WS2012 R2).
Microsoft’s initial ambition with HVR was to create a DR solution for SMEs. They decided to use asynchronous replication with HVR because those SMEs usually face challenges with WAN and Internet connectivity. Synchronous replication requires high-quality and low-latency connections, which are usually out of the realms of possibility for the SME, either due to cost or local availability. HVR was also designed to deal with the connectivity outages that are all too common for the SME that is using commercial broadband. The design decisions taken by Microsoft on behalf of the SME also attracted the attention of the large enterprise. A large data center might use SAN replication to replicate to another identically configured data center, but large enterprises often have regional/branch offices and/or retail outlets that they want to provide DR solutions for. Putting in low-latency connections is out of the question, so a free replication solution that uses asynchronous replication, such as HVR, sounds like a good option.
HVR operates at the software layer, not caring what type of server or storage hardware you have on any site. This is another one of the things that larger enterprises like: They sometimes find themselves trying to provide a central DR site for regional offices where those offices have local purchasing autonomy. Trying to do hardware-based replication from one branch office with EMC storage, another with Dell storage, and a third with NetApp storage… well, that complicates things for the central IT staff, who will have to buy matching storage and acquire additional skills. A software-based solution such as Hyper-V rises above hardware fragmentation because it just does not care. You can replicate from a site with HP servers and an iSCSI SAN to another site with Dell servers and Storage Spaces on DataOn JBODs. It doesn’t get more complicated than that example.
Hyper-V Replica is hardware agnostic replication.
The configuration is slight different for clusters so administration can be simplified, but HVR does not care if you use a standalone host or a Hyper-V cluster. You can replicate:
Note that you cannot replicate from one host to another if both hosts are in the same Hyper-V cluster.
The Hyper-V Replica settings of standalone hosts are managed in the Host Settings in Hyper-V Manager. Hyper-V Clusters will use a central point of administration and a single identity for HVR for all of the member nodes called a Hyper-V Replica Broker.
Microsoft disabled HVR and inbound replication by default. A Hyper-V host or cluster will not accept replication traffic unless it is configured to. There are two basic policies:
There are two ways that hosts can identify themselves and authorize using Hyper-V Replica:
There is a business opportunity for service providers. Hosts can authorized and replicate using SSL, and a single host/cluster can accept inbound replication from many hosts/clusters. This service could be useful for smaller businesses that cannot afford or manage a DR site infrastructure.
Small businesses with limited bandwidth and larger businesses with many terabytes of VMs will want to know what options are available to perform the initial copy of virtual machines to the DR site. There are three ways to do this.
HVR uses asynchronous replication to send data to the secondary site. A replication policy is created for each required virtual machine (including selecting virtual hard disks) in the primary site. Once completed, HVR will start to monitor the changes of each selected virtual hard disk. This is done by mirroring the changes to a Hyper-V Replica log (HRL) file, stored with the virtual hard disk.
In WS2012 Hyper-V, the asynchronous replication interval is fixed (cannot be changed) at every five minutes, meaning replication occurs every five minutes. In WS2012 R2, you can choose a replication interval of every 30 seconds, every 5 minutes, or every 15 minutes for each virtual machine.
The HRL file (or files) of a virtual machine is swapped out and replaced with a new one for the next interval. The replaced HRL is sent to the DR site and applied to the replica virtual machine, updating it with the latest changes.
Hyper-V Replica Logs used for replication.
You can choose to maintain restore points in the secondary site for a replicated virtual machine. You can have up to 15 restore points in WS2012 (one per hour) and 24 restore points (one per hour) in WS2012 R2. This allows you to failover the virtual machine as it was maybe 1 hour ago, 15 hours ago, or even 24 hours ago (WS2012 R2).
HVR accomplishes this by creating checkpoints (snapshots in WS2012) of the cold replica virtual machine in the DR site. Each checkpoint is presented in a drop-down list box; you choose which restore point to use. Note that you can also tell Hyper-V to use Volume Shadow Copy Service (VSS) to create these checkpoints to guarantee application consistency.
In the secondary site, HVR maintains a regularly updated (more on this later) offline or cold replica virtual machines. These are identical copies of the production virtual machines that are locked down into a replicating and powered down state. There are two types of disaster recovery that you can perform.
When we discussed the concepts of Disaster Recovery, we talked about the need for testing the Business Continuity Plan. Hyper-V Replica offers the option to perform a test failover, which does not impact the replica virtual machines. This is important because a disaster might happen during a test window:
The method used to enable a test failover without impacting replication is quite elegant. A clone of the virtual machine configuration is created. This virtual machine is given a differential virtual hard disk that points to the virtual hard disk (or recovery point snapshot) as the parent disk. This gives you an instantly created and slimmed-down clone of the replica virtual machine. Not only do you have zero impact on replication, but the storage space consumed is minimalized.
Getting your VMs online in the DR site is pointless if they cannot be contacted. There are several ways to deal with this, including: