Last Update: Sep 04, 2024 | Published: Aug 01, 2016
This post will explain the use of availability sets and how to deploy them with Azure Resource Manager (ARM) or Cloud Solution Provider (CSP) virtual machines.
Like every form of computing (physical, virtual, or cloud), Azure has outages. Some of these outages can be planned, such as host patching, and some are unplanned, such as power failures. Microsoft has designed Azure to deal with this so that you can maximize the uptime of services that are running in virtual machines. This involves two concepts:
But what if Azure places 3 of those web servers in the update domain? When Microsoft deploys updates to Azure, the underlying host will experience a reboot and 3 of your web servers will go offline! What if all 5 of your virtual machines are placed into a single fault domain and a fuse blows in the common network switch? Now your entire web service is knocked offline and you are out of business. This is why, on premises, we deploy anti-affinity in Hyper-V or vSphere, and why we can use availability sets in Azure.
A couple of notes:
An availability set is a way of tagging a group of virtual machines that perform the same task, such as a pair of domain controllers, the nodes of a SQL cluster, or a set of load balanced virtual machines. The availability set instructs Azure in how to place the virtual machines in different fault domains and update domains.
Classic or ASM virtual machines are pretty flexible about when you can assign an availability set; ARM virtual machines can only be assigned to an availability set at the time of the creation of the virtual machine. You cannot assign an ARM virtual machine to an availability set after creation.
However, because there would be more virtual machines than update domains, you would have one update domain that contains 6 virtual machines.
Imagine I deployed an availability set with 3 fault domains and 20 update domains, and I created 60 web server virtual machines in this set. The 60 virtual machines would be spread evenly over the 20 update domains, with 3 machines in each. During planned maintenance, I should expect 3 virtual machines to be affected. My update domains cannot be spread evenly (20 / 3) so I should expect 7 update domains to exist in two of the fault domains, which means that a localized outage could impact up to 21 virtual machines at a time.
Availability sets are extremely rigid in an ARM/CSP deployment, so make sure you plan your availability sets before you start deploying your virtual machines or resource groups.