Monitoring Azure Virtual Machine Performance
This post will explain how you can use Alerts in Microsoft Azure to monitor the performance of Azure virtual machines and create alerts or trigger actions when performance breaches pre-defined thresholds.
There are two types of basic monitoring that we should do from the virtual machine’s perspective:
Say Goodbye to Traditional PC Lifecycle Management
Traditional IT tools, including Microsoft SCCM, Ghost Solution Suite, and KACE, often require considerable custom configurations by T3 technicians (an expensive and often elusive IT resource) to enable management of a hybrid onsite + remote workforce. In many cases, even with the best resources, organizations are finding that these on-premise tools simply cannot support remote endpoints consistently and reliably due to infrastructure limitations.
- Are the virtual machine and its services running?
- How well are the virtual machine and its services running?
The second of those questions is our focus here. We can view and gather metrics in Azure to see how well a virtual machine is performing? That means we will be able to answer certain questions, such as “why is my Azure virtual machine slow?” And you can ask without making wild guesses or silly generalizations, such as “Azure is slow.” Instead, we want to use empirical data to solve any problems using computer science.
Anyone who has worked on Hyper-V will know that there are two ways in which you can monitor the performance of a Hyper-V, and therefore an Azure virtual machine. There are (Hyper-V) Host metrics and guest OS metrics. In Azure, we get a set of Host metrics by default because Azure hosts can display this information without any interaction with the virtual machine.
We get some useful metrics there but an important one is missing: available memory. If I want a general view on how a virtual machine is really performing, I can get that thanks to these metrics. Hyper-V admins will consider these are true, “this is how the machine is performing” metrics.
Other metrics, such as available memory, are guest OS metrics. You must enable guest-level monitoring via the Virtual Machine Agent to gather this data. If guest-level monitoring is not already enabled, you can enable it by:
- Make sure the virtual machine is running.
- Browse to Diagnostics Settings in the virtual machine.
- Click to Enable Guest-Level Monitoring.
The required virtual machine agent is then installed into the guest OS of the virtual machine. This does take a few minutes.
Unfortunately, the process does not ask you what storage account you would like to use for recording diagnostics data. Instead, it creates one by itself. You can override this selection by creating a storage account (hot blob or general) of your own, browsing to Diagnostics Settings > Agent and selecting the storage account there.
A set of guest OS metrics are selected to gather for you. You can completely customize the gathered metrics by going into Diagnostics Settings > Performance Counters and selecting custom. You need to know the family and name of the metric, such as \Processor Information(_Total)\ percent of Processor Time and enter that path to add it to the gathered set.
There are two ways that you can view the metrics (via the same interface) of a virtual machine:
- Virtual Machine: Open the settings of the virtual machine and browse to Metrics.
- Azure Monitor: Open Azure Monitor in the Azure Portal, click Metrics, choose the subscription and resource group of the virtual machine, and then select the virtual machine.
The list of available host-level and, if enabled, guest OS level metrics are presented on the left. You can select metrics to view them on the chart.
There are a few ways that you can use this chart:
- The chart can be switched from a line chart to a bar chart.
- You can change the time range shown.
- You can create performance alerts that can trigger webhooks, automation runbooks, or send emails.
- You can rename the chart. Then, you can pin it to your current dashboard in the Azure Portal, making it easy to track the performance of a LOB app.
The latter is particularly underused. You can have several dashboards, which can be shared. You can switch between customized dashboards when doing different jobs. Each dashboard gives you a set of shortcuts or views relevant to the service that you are working on.