A major benefit of deploying VMware’s vSphere 4 is the additional options it offers you for business continuity and disaster recovery, such as virtual machine level backups and high availability features. vSphere 4 Essentials Plus, Advanced, and Enterprise editions all include VMware Data Recovery, and a number of third party applications (e.g. Veeam’s Backup & Replication v5 , “Best of Show” @ VMworld 2010) provide even more powerful features.
A more detailed explanation of the potential of vSphere based DR solutions is beyond the scope of this article, but all the products utilise VM snapshots to enable backing up of live VMs without affecting availability:
VMware Data Recovery Snapshot Based Backup
Another major feature in vSphere 4 Advanced and higher editions is Fault Tolerance, which is intended to eliminate VM downtime in the event of a host server failure by creating a live shadow instance of the VM on another host and keeping them in “lockstep” synchronisation. The effect is similar to clustering, but since it operates at the hypervisor level it does not require any special features at the VM software level. Again it is beyond the scope of this article to discuss the advantages and disadvantages of this technology, the important thing to note here is that FT enabled VMs cannot be snapshotted.
As we have already seen, all the vSphere based VM backup solutions rely on snapshot technology to image live VMs, but Fault Tolerant VMs cannot be snapshotted which therefore precludes backing them up. Unfortunately, it seems that the first time many users discover this is when trying to run their first backup of a FT VM. Depending on their backup application, it will either not allow them to create the backup job in the first place, or the job will fail shortly after starting.
It seems that this situation was exacerbated by conflicting information from VMware when vSphere 4 was originally released, at one point they said that Fault Tolerant VMs would be allowed a single snapshot for backup purposes but this feature was not included in the released version of vSphere 4.0. Subsequently they implied that it would be enabled in a later update, but despite vSphere 4.1 including some major improvements to FT it appeared that they had given up on the single snapshot feature, at least until the next major version release.
To be completely accurate, it is still not possible to backup FT enabled VMs. What is now the commonly accepted method is in fact a workaround, as it involves disabling FT in order to allow a snapshot to be created, and then re-enabling it once the backup has completed. This means that for the duration of the backup, the Virtual Machine will no longer have the benefit of FT protection, which may be a problem for some readers. Unfortunately if that is the case, then you will have to look at alternative backup methods, i.e., either running inside the VM or at the SAN level.
It is quite simple to test the process manually in order to establish that your backup application will be able to image an FT VM; you just need to turn off Fault Tolerance for that VM. Note that you have to “turn off” rather than “disable”, otherwise it still won’t allow a snapshot to be created.
Turning Off Fault Tolerance
In your vSphere Client, right-click on the FT VM and select “Fault Tolerance”, then “Turn Off Fault Tolerance” from the sub-menu. The “Disable Fault Tolerance” option will only stop the lockstep synchronisation, but leaves the secondary VM in place. “Turn Off” will actually disable lockstep sync and then remove the secondary VM completely, leaving you with a normal VM that can be snapshotted as usual.
Now start your backup running and if you watch the bottom “Tasks” section in the vSphere Client, you should see the snapshot being created by the backup application. Once the backup has completed, the snapshot should be removed. Once that is done you can right-click the VM again and select the “Turn On Fault Tolerance”. This might take a few minutes as it has to create a secondary VM and bring it into lockstep sync with the primary, exactly the same process as when you originally enabled FT.
This process obviously isn’t a practical solution for regular scheduled backups. However, it does demonstrate the steps required to allow backups of any Fault Tolerant Virtual Machine, and it will give you an idea of the potential hazards involved. The main problem has already been mentioned; the VM will not be FT protected during the backup process so a host hardware problem would cause downtime, and an error whilst turning FT on or off could leave the VM unprotected, requiring manual intervention. On a more positive note though, in my experience such problems are very rare and are unlikely to cause downtime by themselves, and whilst the VM lacks FT protection, it will still have vSphere High Availability. Therefore, should the host fail, the VM will be started on another host. It will have undergone a “dirty shutdown” so there may be some data loss or even corruption and a short period of downtime, all of which illustrates quite neatly why Fault Tolerance was an attractive option in the first place!
Fortunately vSphere has comprehensive scripting support, allowing for the automation of any process achievable via the vSphere Client GUI, so we can use this to turn Fault Tolerance off and on when required. vSphere scripts are written in Perl, but don’t worry if you have no experience of using that – William Lam of www.virtuallyghetto.com has already done the hard work for us and created a suitable script. The instructions below will show you how to implement it. However if you are using a third party application such as Veeam and have an active support agreement with the vendor, then you should contact them first to see if they have their own solution.
The first step for every scheduled FT VM backup needs to be turning off Fault Tolerance, so you can use the Windows Task Scheduler to create a scheduled task to run your batch file at the appropriate time. The Windows Task Scheduler in Windows 2008 Server is quite different to use from the old Windows 2003 Server version but the end result is the same; this website has a good guide to the 2003 version, and this site explains the new 2008 version.
Note that when you create your task, you can specify what Windows user account it should run under. If you are using the passthrough authentication option, it is essential that you specify an account that has sufficient rights on your vCenter Server to change the Fault Tolerance settings for the VM. Configure the task to run at a suitable time and frequency for your backup schedule.
Next you need to create a backup job for your FT VM, just like any other VM backup, but you should schedule it to run at a suitable period after the “Turn Off FT” task to ensure it has time to complete that before starting the backup, otherwise it will fail. Usually I find allowing a delay of 15 minutes is ample, but you should be able to confirm what is best for your system with some testing. Setting the time for reactivating Fault Tolerance is harder because the duration of the backup job may be quite variable from day to day – set it too early and the task will fail, whilst making it too late will leave your VM unprotected for longer than necessary. The best option, which most backup applications support, is to use the option in the backup job properties to run a command after the job has completed:
Although it sounds like a complex and significant set of tasks to be running on a perhaps nightly basis it does in fact usually turn out to be a reliable procedure once setup and the schedules established. The main disadvantage has already been highlighted – the Fault Tolerance protection has to be turned off in order to backup the VM, which increases the risk of downtime during the backup window. Depending on the role of the VM in question this may or may not be an issue, but with some planning it should be possible to minimise the risk. For example, by combining a weekly VM level backup with a daily OS level backup. The benefits of VM image based backups, particularly the speed and ease of recovery, should make it a worthwhile effort.