In “Windows Server Failover Clustering: Why Cluster Quorum Matters,” I explained how a Windows Server Failover Cluster uses quorum to decide if or how the cluster should remain operational if it is partitioned by a network problem. The mechanism that was most commonly used to provide quorum was rather rigid and required a lot of care in a rapidly growing environment. In this article, I’ll describe what Windows Server 2012 R2 dynamic quorum is and give you my thoughts on the best way to approach it.
Microsoft has placed a lot of focus on such clouds in Windows 2012 R2 and System Center, and it should be no surprise that one of the new features in Windows Server 2012 R2 (WS2012) Failover Clustering is designed to simplify quorum management.
A cluster is made up of several different nodes. In the case of Hyper-V, each node is a Hyper-V host, each using common storage to share virtual machine files, and each connected by networks. A NetFT driver uses these networks to enable a heartbeat signal to be transmitted between the nodes. If the cluster becomes partitioned — in other words, if there is a network failure that divides the cluster into isolated non-communicating islands — then the cluster needs a way to decide which partition should remain active and host services, and which partitions should temporarily drop out of cluster services until the problem is resolved.
If we had an uneven number of nodes, then the split is likely to be 2:1. The partition with two nodes has more than half of the cluster’s nodes and will remain operational. The other partition does not have more than half of the nodes and will failover its HA roles to the nodes in the winning partition.
Uneven number of nodes achieving quorum. (Image: Aidan Finn)
If there are an even number of nodes, then there is a chance that the cluster will partition evenly, maybe two nodes in one partition and two nodes in the other partition. If that happens, then neither side has more than half of the nodes and cannot achieve quorum. There is a tie, and both partitions drop out of operation.
Even number of nodes achieving quorum. (Image: Aidan Finn)
Obviously that is bad. The traditional solution is to install a disk witness if there’s an even number of nodes. This quorum disk is used to break the vote. However, if there were an uneven number of nodes, then we were advised to remove the quorum disk. That meant that some poor sucker was frequently adding and removing disks in rapidly growing environments, such as a public cloud or huge private cloud.
This new feature, also called Dynamic Witness, was added in Windows Server 2012 R2 Failover Clustering to deal with a situation where the quorum requirements of a cluster are changing frequently. By default, every node in a cluster gets a vote for deciding quorum. Dynamic Quorum is automatically used by Failover Clustering to rig the vote in case there is a chance of a tied vote, where a vote will be stripped from one of the nodes.
You can see in the following example that both Demo-Host1 and Demo-Host2 each have an assigned vote. However, there is no file share witness or witness disk and this leads to a tied vote if this cluster was partitioned. Dynamic Quorum fixes the vote by removing a current vote from Demo-Host2. Therefore, if this cluster becomes partitioned, Demo-Host2 will temporarily cease cluster operations until it can communicate with Demo-Host1 once again.
Windows Server 2012 R2 Dynamic Quorum. (Image: Aidan Finn)
And there’s the catch! What if I perform scheduled maintenance that brings Demo-Host1 offline? I can tell you from experience that the virtual machines on Demo-Host1 will not failover to Demo-Host2. Demo-Host2 cannot vote, and therefore it will not achieve quorum to continue cluster operations.
This is why Microsoft advises to always use a disk or file share witness when deploying Windows Server 2012 R2 clusters. Dynamic Clustering will be used to adjust the vote if there are an uneven number of nodes, and reset the vote to default if there’s an even number of nodes.