This morning a client had a DAG failure which caused mailflow to cease.
The primary server DAG Network was showing as being unavailable/offline. There was network communication with both nodes in the cluster, and the FSW.
The fix was to use Failover Cluster Manager to highlight the Unavailable cluster network, and untick ‘Allow Clients to Connect Through This Network’. Although this tick box would automatically re-enable immediately, the DAG instantly came online and mailflow returned.
I am very concerned that this is a single point of failure, in a setup as below –
Site A – 1 DAG Member, 1 FSW
Site B – 1 DAG Member
How can I ensure that failure of the cluster network on the server in Site A does not bring the whole DAG down – What step(s) should be taken to increase resiliency?
Were I to of introduced a second DAG member in site 1, could this outage caused by the cluster network failing on the active server have been avoided?
You must be logged in to reply to this topic.
Create a free account today to participate in forum conversations, comment on posts and more.