Last Update: Sep 04, 2024 | Published: Aug 10, 2016
This post will show you how to use the built-in tools for troubleshooting faulty virtual machines in Azure.
No matter what platform that you use – physical servers, Hyper-V, vSphere, Azure, or AWS – sometimes a machine starts to act up a little. Veteran admins know the pain of working with a faraway server when remote desktop fails to connect – you either hope you still have the ability to reboot the machine remotely or that someone with administrator rights is in the locality.
But what if an Azure virtual machine starts to misbehave? What do you do then? As a tenant in Azure, you have no access to the fabric. You cannot see if a localized host issue is breaking your virtual machine. Your only ways to connect to the machine are via PowerShell and remote desktop – you don’t have the Hyper-V Connect utility for console access. Luckily, Microsoft gave us a set of tools for troubleshooting virtual machines in Azure.
Microsoft is not putting pressure on customers to migrate from ASM deployments, but it is clear from recent developments that Microsoft is focusing feature improvements on ARM, and is either not necessarily bringing the same capabilities (lesser functionality or not at all) to ASM workloads. This is the case with troubleshooting virtual machines.
If you browse the settings of a classic (ASM) virtual machine then you will see the following troubleshooting options:
But if you browse the settings of a Resource Manager (ARM) virtual machine then you will notice that there is (currently) one extra option (Redeploy):
The first option allows you to figure out an issue with a virtual machine. If you click Diagnose and solve problems, a blade opens up (below). This blade will tell you if there are any known issues that might affect the virtual machine. A set of common problems/solutions are listed.
For example, I can expand the solution for My VM is slow. The guidance from Microsoft will instruct me on how to diagnose the issue and resolve it. It includes instructions such as:
If you reach the end of the trail, you are advised to consider contacting technical support. More on this later in the post.
Another scenario that veteran admins have seen is the “it’s been fine for months but it just broke” problem. Very often this is because “somebody” (it’s always “somebody”) changed something but we don’t know who changed it or what they changed.
Azure logs all changes, and you can view these via the Activity Logs blade. You can filter the changes to a particular time period to see what changed, and then diagnose the issue. For example, if performance has dropped, and someone reduced the spec of the virtual machine around that time, then you know what the cause and solution are.
If you have been clever and avoided the use of a single generic user account, and forced every admin to sign in using their own account, then you will also see who caused the problem … and take the appropriate HR-approved actions.
As tenants of a cloud, we have no visibility behind the curtain; we cannot see if Azure is healthy. Microsoft does share some health information from the 10,000-foot level, but that doesn’t tell me if a single host or storage node might have an issue that affects my virtual machine. However, the Resource Health blade does offer information that can help at that level, offering the results of an Azure service that monitors the resources that my machine is depending on.
If you have enabled Boot Diagnostics in the Diagnostics settings of the virtual machine, then you can get a view of the machine as it boots up and is running – this is like watching the console of the machine without any interaction.
With Windows virtual machines, we can see if the machine has booted up normally or if the console is displaying an error. Linux displays a lot more information on the console during a boot up; you can see a trace of this information in the Boot Diagnostics troubleshooting blade.
I love this feature; we have the ability, via the VM Agent extension, to reset the local administrator username and password of a virtual machine. There are two options in this dialog, if using ARM virtual machines:
Sometimes you will find that the virtual machine just need to be moved from a host. Maybe the host is faulty. Maybe moving the machine will reset it. The redeploy action, which is not available for ASM virtual machines, will move a machine to a new host.
The machine will be restarted during the process. The temporary drive will be lost and recreated (but you knew not to keep valuable data there, right?), and the machine will start up with your OS and data disks on another host. This reset process will hopefully resolve any localized issues that you might have been affected by. Note that this process will take a few minutes to complete.
There are two types of support request in the world of Azure:
If you have acquired Azure via the Cloud Solutions Provider (CSP) channel, then your first point of contact for support should be the reseller. They have a means to escalate technical support requests to Microsoft, and because of the way that billing is done via the reseller’s private pricing file, all billing requests really need to go through the reseller too.
If you acquired Azure via any other means then your route to contact is dependent on a few variables. All channels of Azure (except CSP) include free billing support from Microsoft; that means if you have a query about consumption, billing reports, or your subscription/account, you get built-in support from Microsoft as a part of your agreement.
No server product from Microsoft comes with built-in technical support. If you wish to create a technical support ticket (non-CSP subscriptions) then you must purchase a support contract from Microsoft.
If you are not using a CSP subscription, then you can use this blade to open a support ticket.