Automated Failover of Internet Web Services to Azure

Azure Hero Server
In this post, I will show you how you can fail over an Internet web service from a “local” data centre to Azure, which could optionally be included in a design for Azure Site Recovery (ASR).

The Problem

Let’s assume that I am running a web application that is important to my business. I have hosted this application either in an on-premises DMZ or in a Co-Lo (co-location) hosting facility where I am renting rack space, bandwidth and power. The business has decided that it needs this important application to survive a facility failure. Deploying a traditional disaster recovery (DR) solution will be too expensive and distracting, so it has been decided that I must use Azure as a DR site.

The Solution

The first component of the solution is basic Azure Site Recovery (ASR), Azure’s DR-as-a-Service offering. ASR replicates the machines (physical, vSphere, or Hyper-V) from the production environment to a recovery services vault in Azure. The virtual disks sit there until a failover is invoked. At that time, a recovery plan is started, and orchestrates the creation of virtual machines (attaching to the replicated disks), and boots them up in the desired order on the desired Azure virtual networks (VNets).
The tricky bit is how to redirect end users from the now offline production site to a service running in Azure.
The second component of the solution is to solve that redirection problem. An Azure Traffic Manager profile is deployed in priority (failover) mode. Two endpoints are added to the profile:

  1. High priority: The public DNS name of the web service running in the production site.
  2. Low priority: The public (Microsoft-managed) DNS name of the Azure load balancer’s public IP address that is used for the failed over web service.

The web service is advertised to clients using a CNAME record, which points to the Microsoft-managed DNS name of the Traffic Manager profile.
If the on-prem web service is available, then clients are sent to the IP address of that installation. If it times out, then clients are redirected to the Azure service’s public IP address (the load balancer).

Failover over web services to Azure and redirecting Internet clients [Image Credit: Aidan Finn]
Failover over web services to Azure and redirecting Internet clients [Image Credit: Aidan Finn]

The Azure Load Balancer

The first step is to configure the availability set for the Azure web farm; the virtual machines don’t exist yet (and only will during failovers, such as a planned test), so you just create the availability set.
The solution does have a glitch. You will need to add one virtual machine, acting as a web server, to this availability set. Without this you cannot add the availability set as a backend pool in the load balancer. The positive side of this solution is that this web server can host a “This website is offline” page. Cllients will see this page if they are accidentally directed here (the production site is offline but a failover hasn’t started/completed) In the event of a real failover, your recovery plan can run an Azure Automation runbook to power down this virtual machine, leaving only the failover over web servers, which are hosting the real web application.
Next you will create and configure a load balancer:

  1. Deploy an external load balancer.
  2. Add the availability set as a backend pool
  3. Create a load balancing rule for the web application.

The second glitch is that you can only create a load balancing rule for machines that are running in Azure. That’s not a huge issue – create a rule for the dummy machine that is running in the availability set. Your Azure Automation runbook that shuts down this machine can also replace the load balancing rule to include only the failed over virtual machines.
To wrap up the load balancer, we need to make is accessible from the Internet. Create a public IP address and define a DNS name (managed by Microsoft), and associate that IP address with the load balancer.

Associate the DNS-configured public IP address with the load balancer [Image Credit: Aidan Finn]
Associate the DNS-configured public IP address with the load balancer [Image Credit: Aidan Finn]

Traffic Manager Profile

Perform the following steps to configure automated failover from the on-prem web server IP address to the Azure load balancer IP address. First start by creating a Traffic Manager profile with the Priority routing method.
Priority is how we define failover; a lower number is a higher priority.

Create a Traffic Manager profile with Priority routing method [Image Credit: Aidan Finn]
Create a Traffic Manager profile with Priority routing method [Image Credit: Aidan Finn]
Add an Azure endpoint for the Azure load balancer’s public IP address. This endpoint should have a low priority because this is the failover site.
Add the failover endpoint for the Azure load balancer [Image Credit: Aidan Finn]
Add the failover endpoint for the Azure load balancer [Image Credit: Aidan Finn]
You will then add a second, external endpoint. This will point at the public DNS name of your production website.
Add the production endpoint for the on-premises service’s public domain name [Image Credit: Aidan Finn]
Add the production endpoint for the on-premises service’s public domain name [Image Credit: Aidan Finn]

CNAME Record

Obtain the DNS name of the Traffic Manager profile, petriweb.trafficmanager.net in this example, and create a CNAME record for this name in your public DNS. Distribute this CNAME to your clients because this is the name that your users/customers will use to browse the website.
Browsers will be sent to Traffic Manager, and the Azure service will then use priority in conjunction with availability to determine where to redirect clients to:

  • The production site if it is available
  • The failover site if the production site is not responding to Traffic Manager probes

Azure Automation

Finish up the solution by creating a runbook in Azure automation. This runbook will:

  1. Start the dummy Azure virtual machine web server if it is not running (failing back to production)
  2. Stop the dummy Azure virtual machine web server if it is running (failing over to Azure)
  3. Replace the Traffic Manager load balancing rule to direct traffic to the dummy Azure virtual machine web server (failing back to production)
  4. Replace the Traffic Manager load balancing rule to direct traffic to the failed over Azure virtual machine web servers (failing over to Azure)

Now you have all the component for failing a web service to Azure.