Last Update: Sep 04, 2024 | Published: Oct 30, 2017
Microsoft is adding a new feature that allows you to control the forced outages that occur to virtual machines when patches are delivered to Azure’s compute hosts.
Ideally, when you deploy a service using Azure virtual machines, those virtual machines should be a part of a valid availability set. Here’s a quick reminder on availability sets:
There is no live migration in Azure. Imagine tens of thousands of machines live migrating on a single cluster and what that would do to the infrastructure and the applications in the virtual machines! Instead, when the host reboots the virtual machines have downtime.
An availability set tags virtual machines so that Azure knows to put them into different update domains. When Microsoft deploys updates to Azure, they do so in an ordered fashion, one update domain at a time. This means that only a small number of hosts are ever offline because of patching and rebooting. If you have configured anti-affinity by using availability sets, then only one (or a few) virtual machines will ever be down at one time.
The key part is: this must be a valid availability set to achieve the 99.95 percent SLA for the service on those machines. Putting one domain controller and one file server into an availability set achieves nothing for uptime and the SLA won’t apply. But putting 2 load balanced web servers into an availability set qualifies the web service for the SLA and minimizes downtime
What about those services that don’t make up a part of a valid availability set? How much downtime do they incur? Once upon a time in Azure, they were down for as long as it took a physical server to reboot. Server admins know how long that can take. I’ve seen faster snail races. However, two years ago Microsoft introduced something called In-Place Migration, which was briefly known as Warm Reboot in Technical Preview 1 of Windows Server 2016. Almost all of the time that Microsoft patches a host, they will:
The entire process takes between 15-30 seconds. Most of us never notice that brief amount of downtime, however:
To be honest, there’s not much to this feature from our point of view but it will be very valuable to customers that must only have downtime during maintenance time of their own choosing. The new Planned Maintenance feature for Azure virtual machines allows that to happen:
The redeploy action is one of the maintenance tasks in Azure. It allows you to reboot a virtual machine on another host. Planned maintenance will ensure that the destination host doesn’t have any planned outages in the near future.
To make Planned Maintenance work, Microsoft had to make some other improvements:
This is a simple feature and but it should save a few scalps in the operations departments of many Azure customers.