That’s the question I asked myself earlier today when I was down to my very last option for saving my own web server. In this article, I’ll tell you all about what went wrong with my Azure virtual machine and how Azure Backup for virtual machines saved my bacon.
I’ve been running a blog since 2006. At first it was something that I used to generate attention for my new career as an IT contractor. Over time, the site evolved, and I’ve been stunned by how many people come to it every year to search for things on Windows, Hyper-V, and other subjects that I’ve littered the Internet with. My site has become an important notebook for me, but it’s also a source of income with several advertisers gracing me with their business. That means that the website is mission critical, and I cannot afford to lose it.
There’s a saying in IT: if you don’t have three copies of your backup, then you don’t have a backup. I’ve never trusted backup tools, and maybe that’s because I’ve been forced with too many big-name, bad products over the years, and I’ve become quite wary. So when I migrated my WordPress website from a local hosting company to Microsoft Azure, I implemented the following:
I had three kinds of backup in operation and two of those was keeping three copies of the data, with one of those even being in a different storage account. I think I had covered all of the bases… or had I?
All software has security vulnerabilities. This is why we encourage people to apply updates to their machines on a frequent basis. This is easy to configure in Windows, so my web server was configured to apply Windows updates. However, I wasn’t doing this very regularly with WordPress, and I had never updated MySQL (5.5.x). And let’s not forget about the third-party plug-ins in the WordPress site. And did I mention that I was using the same, not-updated theme on my WordPress site since… oh I forget but it wasn’t in this decade! I was asking for trouble, right?
But at least I had installed the Microsoft Anti-Malware extension and Windows Update was deploying security fixes.
Yesterday was a bit of a crazy day, so I wasn’t paying any attention to email or social media. I came to work today, fired up TweetDeck as I normally do to start a catchup, and saw several mentions with people warning me about my site being down. That’s normally not a huge deal because I can usually catch it early, run IISRESET, and things are back to normal for a few more months. It was different this time.
I logged into the guest OS of the virtual machine and ran IISRESET. The IIS service and the application pool were started anew, and this normally resolves any issues. This time, however, the problem persisted. I rebooted the virtual machine, and the problem changed. Instead of a HTTP 404, I was getting a “Database connection timed out” error. This indicated to me that WordPress was running, but MySQL was not responding. Ouch! This is my worst fear with WordPress. Although I can muddle my way around SQL Server and have rescued databases like many of my fellow accidental DBAs, MySQL is a nasty piece of open source <insert adjective of choice here>.
I went to Services, and I found MySQL was stuck in a starting state. It was completely hung, and there was nothing I could do with it. I searched online for solutions, but found nothing relevant. That led me to a new idea by trying to upgrade MySQL to repair the service. I tried that, but the process got only so far until the upgrade wanted to stop MySQL, which I could not do because it was hung. I set the MySQL service to manual, rebooted, and retried the installer, but it required MySQL to be running to upgrade it.
I gave up on this install of MySQL, and I would have to resort to using one of my backups. I uninstalled MySQL 5.5.X and installed the latest version. I tried to import my “all databases” export, and that worked… but there was no sign of my WordPress database. I guess “all databases” means something different in penguin-speak.
I mucked around with that for a while. I even restored a backup of the export file from Azure Backup and imported that without success. I had reached the point where there was only one remaining option to save my nine years of blogging content. I would have to restore my Azure virtual machine from the Azure Backup preview service for virtual machines.
I take a backup of the virtual machine every day and retain four weeks of recovery points. This gives me some scope to go quite a bit back in time if the need arises. I decided that using yesterday’s backup was useless because the failure happened sometime before the backup. I went, instead, with a recovery point from Friday.
The restore process for virtual machines is still quite a bit ropey. You:
I also created a new virtual network. I started the restore and waited … and waited … and … yes, I was stressed. Anyone that has ever restored a production machine knows exactly how I felt. This is when you find out if:
The VM was eventually restored, and I panicked a bit when the Virtual Machines view was slow to refresh. The machine booted, and then I tested the website using the cloud service domain name… and I got a HTTP 404. Panic was rising before I checked the endpoints of the new virtual machine. HTTP was there with an internal port of 80 (good), but with a random external port of 50000-something (not good). I changed the external port to 80, retested, and I was greeted by “Database connection timed out.” I was just about to cry into my keyboard when it dawned on me: MySQL was replaying the transaction logs. I waited and tested, feeling my mouth get every drier and the deafening thump in my chest. Then the MySQL service refreshed from “Starting” to “Running.” I refreshed the browser … it was so slow … and the familiar very old WordPress theme appeared on my screen.
I use a CNAME for my website that points at the Microsoft-managed domain name for the cloud service. This CNAME also has a TTL of 300 (5 minutes). Once I was sure that I was back online, I modified the CNAME record with my registrar and waited. 5 minutes later, www.aidanfinn.com was back online.
I noticed the Anti-Malware extension was eating up a bit of CPU, so I opened the admin console and checked out what was going on. A full scan was starting. I checked the history, and I was shocked. Lots of .PHP malware had been found and quarantined over the previous couple of months. I was happy that the malware was caught before it could do anything, but I was upset that it was there in the first place. My machine has a minimal TCP footprint on the Internet, but I knew roughly where the culprit was:
I updated the definition file in Microsoft Anti-Malware and ran a full scan. I then:
As I write this, the full scan is still running. When that’s complete I plan to upgrade MySQL and run an online scan against the guest OS of the virtual machine just to be sure.
This was a case of too close, so I’ve added another backup option. A WordPress plugin has been installed to perform a daily backup of the site and the database, just in case things go wrong again.
A few lessons were re-learned today:
And finally, Azure Backup of virtual machines really does work, even if the preview release is a little ropey and slow to backup virtual machines.