Monitoring & Troubleshooting Azure JSON Deployments
In this article, I will show you how you can monitor the deployment of an Azure JSON template using the Azure Portal and debug what went wrong using errors in the Azure Portal and the Azure Resource Explorer (still in preview at the time of writing this post).
Say Goodbye to Traditional PC Lifecycle Management
Traditional IT tools, including Microsoft SCCM, Ghost Solution Suite, and KACE, often require considerable custom configurations by T3 technicians (an expensive and often elusive IT resource) to enable management of a hybrid onsite + remote workforce. In many cases, even with the best resources, organizations are finding that these on-premise tools simply cannot support remote endpoints consistently and reliably due to infrastructure limitations.
Monitoring a Deployment
Azure Resource Manager (ARM) uses the concept of deployments to identify each execution of an Azure JSON template. It is possible to deploy multiple templates within a resource group, so it’s useful to be able to know which is which.
After you deploy a template, using whatever method, you can browse to the new or existing resource group to monitor the deployment. Browse to the resource group in the Azure Portal and select Deployments. Each deployment (past and current) is listed there; select the current deployment that you want to monitor and you can track the progress of the deployment. Quite a bit of information is shared by ARM via this interface.
Details of the deployment, such as the subscription and resource group, are shared at the top of the blade. The entered parameter values are also shown – it’s a good idea to double-check these values to ensure that you haven’t “fat fingered” something.
Each resource appears in Operation Details as it is being created. The creation process is based on the dependencies that are defined within your JSON template. For example, you will require a storage account, virtual network, and NICs before creating virtual machines. You can click each resource, as it appears, to learn more about the process.
When I deploy a mid-large template, I open this blade, and then click the link to the resource group in the upper part of the blade. This opens another resource group blade where I can see the new resources being progressed, while I track the progress in the Deployments blade.
I have found that the first indication that there is an error with your deployment is that some step is taking too long. For example, a storage account is normally brought online very quickly (a matter of seconds), but if you find that the resource creation is taking too long, then there probably is an issue with your template.
The second indication of a fault is that the JSON template will fail and a status message will appear:
- In the Azure Portal if you deployed the template using the portal
- In PowerShell if you used a cmdlet or script to start the deployment
You can start to dive into the fault by opening Deployments in the resource group and opening the affected deployment. Any failed resources will have a red status message. You can open the failed resource to learn more. The error can be quite … wordy … but a clue to the origins of the fault will usually be there.
There are some kinds of faults that are more difficult to diagnose. For example, you might be using scripted extensions to perform actions within a virtual machine. The JSON template will sit and wait for the script to end, but if the script doesn’t terminate, there will be no success or failure update. The status messages in the deployment will be useless for diagnosing these faults. Sometimes we need to dive a little deeper.
The Azure Resource Explorer (Preview)
You can use the Azure Resource Explorer to get deep into the weeds of any resource group, browsing Azure’s ARM view of your resource group and any status messages.
The Azure Resource Explorer can be found here. Sign in using your Azure credentials. The navigation bar on the left allows you to expand the subscription and resource group that has experienced the failure. Navigate to the affected resource and look for any information that can help, such as provisioningState.
Note that the Documentation tab offers a great resource for explaining every value that you see in the resource.
I find that between syntax checking of VS Code and the two levels of validation offered by the Azure Portal, I don’t hit too many errors with my deployments. But when deployments fail, it’s usually because of something tricky. Using the above methods and some practice, you’ll be able to diagnose the failures and resolve the issues. Remember to create a new version of your template and you can re-target your existing resource group – you don’t need to delete resources because they will be re-used/tweaked if necessary.