Leveraging the Cluster Validation Wizard for Troubleshooting Storage Problems

Whether you are configuring a brand new Windows Failover cluster, or maintaining an existing one, the Cluster Validation Wizard is a handy tool to verify your storage configuration.  The Cluster Validation Wizard, also known as Validate, is used to perform a variety of tests to ensure that cluster components are accurately configured and supported in a clustered environment.

The Validate tool includes various tests such as listing the system configuration, or performing network and storage tests.  These tests can be run on a new, proposed member of a cluster, or they can be run to establish a baseline for an existing cluster.  Validate can also be used to troubleshoot a broken cluster by isolating the system, network or storage component that is failing a particular test.

This article describes how to use the Cluster Validation Wizard to troubleshoot storage related problems.  It explores the different storage tests that can be performed and how to troubleshoot any failures.  Finally, a Validation Report is examined to illustrate how the tool can be used to isolate any storage related problems.

Using Validate for Storage Troubleshooting

The Cluster Validation Wizard is part of the Failover Cluster Management MMC snap-in.  The tool is installed when the Failover Cluster “feature” is installed via the Server Manager.  To invoke the Wizard, use the snap-in to select “Validate a Configuration…” in the center pane under Management.  The Wizard prompts for the names of the servers and which tests to perform.  Below is an example of the test selection for storage related tests:

cluster validation wizard 01

As you can see, there are numerous storage related tests that can be run to determine if the storage is working as expected.  For example, one of the tests determines whether the Disk Latency is acceptable for read and write access (no timeouts).  Another test validates that multiple nodes can successfully arbitrate for a shared disk.  And yet another test ensures that SCSI-3 persistent reservations are supported by the storage controllers.

After you select the desired tests to run, the Wizard checks to ensure the disks are offline to execute the storage tests.  The amount of time it takes to run the storage tests depends on how many disks are tested and how many servers are in the cluster.  A progress status of each test is listed along with the results of the test: Failed, Warning or Success as seen below.

Cluster Validation Wizard: validation tests

Once the Wizard completes, a Validation Report is generated in an HTML format that can be viewed with your favorite browser.  The report is organized in a logical fashion with a summary of the tests that were run and their results.  You can click on any of the tests (hyperlinks) to see the details associated with the individual component that was tested and its results.  Here you will find the troubleshooting information you need to determine which disks and servers failed any tests.  The following example illustrates a Validation Report and the various storage tests that were performed.

Cluster Validation Wizard: validation report

If you find any failed tests, you should be able to determine the problem by examining the detailed information associated with the test.  It may be necessary to pass all related tests before a particular test will pass.  Finally, it may be necessary to contact your storage vendor to determine why a particular test failed such as updating device drivers or firmware.

Summary

Troubleshooting storage problems in a Windows Failover cluster can be a daunting task.  The sheer number of disks and servers in a cluster can multiply the complexity of the problem.  Fortunately, the Cluster Validation Wizard can be used to systematically test the storage subsystem to isolate any failing components.  A Validation Report is generated which documents the tests and their results, along with hyperlinks to detailed troubleshooting information such as failing disks and server names.

See related storage article on Windows Failover Cluster and iSCSI Technology.  This article discusses how iSCSI storage can be configured to provide a low-cost, shared storage solution for your Windows clusters.