Windows Server Failover Cluster Validation
In this article I will discuss what makes a supported cluster, and how you can validate a new or modified Windows Server Failover Cluster configuration.
The Importance of Cluster Validations
The purpose of a failover cluster is to provide high availability (HA). A mission critical service running on a cluster assumes that the cluster is stable. A Hyper-V cluster must be rock solid because there is more than just one service running on that cluster – there are lots of services running across the collective of HA virtual machines.
Those who remember the days of Window Server clustering prior to Windows Server 2008 (W2008) will know that acquiring a cluster was an expensive ordeal. You could not just go out and purchase any old server and storage. Microsoft only supported failover clustering on validated and certified bundles. On the plus side, this meant that every aspect of the hardware was tested by the manufacturer and Microsoft. On the downside, these kits were more expensive than the sum of their parts.
This all changed with the release of Windows Server 2008. Now it is up to the architect and implementer of the failover cluster to ensure that the configuration can be supported by Microsoft.
Say Goodbye to Traditional PC Lifecycle Management
Traditional IT tools, including Microsoft SCCM, Ghost Solution Suite, and KACE, often require considerable custom configurations by T3 technicians (an expensive and often elusive IT resource) to enable management of a hybrid onsite + remote workforce. In many cases, even with the best resources, organizations are finding that these on-premise tools simply cannot support remote endpoints consistently and reliably due to infrastructure limitations.
Windows Server Cluster Support Requirements
Now when you deploy a Windows Server failover cluster there are two requirements to receive support from Microsoft. The first is that all of the hardware must have passed certification tests from Microsoft and be listed for your version of Windows Server on the Windows Server Catalog – this includes any additional cards that you might at to servers, and so on.
The second requirement is that your finished cluster must pass a series of tests that are performed by the cluster validation wizard that is a part of the Failover Cluster Manager (FCM). This is an easy-to-use tool that automatically performs a variety of tests on your hardware and configuration and then produces a report with summary and detailed results. This tool should be run when you first create a cluster and when you modify the cluster configuration.
Running Validation Tests
You can execute the validation wizard in FCM by selecting the cluster and clicking the Validate Cluster action. The Validate a Configuration Wizard will open and you are walked through the cluster testing process.
The first decision you will have to make is whether you want to perform all possible or selected tests. Note that some tests, such as those for storage, will disrupt service availability. If you are building a new cluster then you should run all tests. If you have modified your cluster, then you should select only those tests that are required. Microsoft has published an expansive table to help you choose what tests to select in the latter scenario.
There are several categories of test that are performed during a cluster validation:
- Cluster Configuration: Lists and validated cluster resources, such as virtual machines
- Hyper-V Configuration: Only performed on a Hyper-V cluster and validates the Hyper-V configuration/
- Inventory: Devices such as HBAs, processors, and so on are tested
- Network: The health and validity of the cluster setup is critical to the stability and performance of the cluster.
- Storage: All cluster data, such as virtual machine files, are stored on the shared cluster storage.
- System Configuration: This tests the node/host operating systems, Windows updates, and services on each cluster member.
An existing cluster will be tested from time to time during the time of operation. Testing the storage of the cluster can disrupt services. For example, testing the disks of a Hyper-V cluster will cause virtual machine outages. Therefore, you might choose to exclude the storage tests, as shown below.
You might need to test a new disk that is provisioned on the cluster. You can use the Test-Cluster PowerShell cmdlet to perform tests with just selected disks or (Storage Spaces) pools by using the –Disk or –Pool parameters.
After some time the tests will complete and the results will be displayed, as depicted below.
Cluster Validation Results
A summary result is displayed when the wizard is complete. You can check out View Report to open the complete results in a web browser. These reports are automatically stored for you in C:\Windows\Cluster\Reports as MHTML files. You can close the browser and reopen the results at a later time.
Note that you will need to supply a cluster validation report when opening a support case for a cluster with Microsoft support.
The report has a summary result that can be:
- Pass: The cluster passed all of the tests and probably will be healthy. Note that this is not a guarantee!
- Warning: The validation tests noticed some problems. The cluster will still be supported by Microsoft but you should try to resolve any remaining issues.
- Fail: There are significant issues with the cluster. Microsoft will not support this cluster.
You can drill down in the report to get more detail by clicking the hyperlink. Even though the cluster might be supported with a warning, in my experience, there will probably be problems. Do everything you can to resolve the issue(s) to get a more stable cluster for your HA services or virtual machines.