In this post, I will demonstrate how you can use Azure’s Network Watcher to check if one Azure virtual machine can talk to another.
“Broad network access” is not one of NIST’s Essential Traits of a Cloud for no reason. Without network connectivity, resources in the cloud are useless. What use would a virtual machine be if you cannot access the services that it hosts or if you cannot integrate it with other systems?
This is why Azure’s Network Watcher is a critical troubleshooting tool. Network Watcher includes a number of tools that can be used in several scenarios. In this post, I will show you how you can figure out the root cause of a communications failure between virtual machines.
I have created a small demo lab in a resource group called rg-petri. There are two virtual machines in a simple flat network:
Administrators have just reported that they can no longer sign into the application server (vm-petri-02). They need to be able to because users are reporting that the services that this machine hosts are broken.
This tool, in Preview at the time of writing this article, performs an end-to-end test. Using a Network Watcher extension that is installed in the source virtual machine, 100 probes (packet tests) are sent to the destination machine from a defined source port to a defined destination port. This is better than any ping because ping only does an “is it responding in a reasonable time?” test using ICMP.
You enter the following information about the source machine when using this tool:
Enter the following information about the destination machine:
This test can take a while because it really will send 100 packet tests from the source virtual machine to the destination virtual machine. I have run a test where vm-test-01 is trying to send RDP traffic (destination port 3389) to vm-test-02. And as you can see below, all 100 probes failed. This indicated that there was a configuration issue of some kind, probably at the destination.
I like this tool because it allows me to run detailed network tests, just like I would from a virtual machine, but without needing to sign-in. I might not have the necessary credentials to sign into the virtual machine.
This is a tool that allows you to test the routing of a virtual machine. For example, what is the next hop if I send a packet to X? This is useful for troubleshooting scenarios such as, but not limited to:
This tool is a simple test to validate that a virtual machine’s NIC is able to do one of the following:
IP Flow Verify does not check that the packets will actually make it or not; it’s not an end-to-end test and packets are not actually sent. However, it does check things like how network security groups are affecting a single virtual machine. I suspect that I have a firewall issue because all of my packets are being blocked.
When you run IP Flow Verify, the following fields must be completed:
I first run a test from vm-petri-01. I want to see if it is allowed to put TCP 3389 packets onto the subnet that is destined to TCP 3389 on vm-petri-02. The results come in and the test passes. So, I suspect that all is well with vm-petri-01.
I then run a test to see if TCP 3389 traffic can get into vm-petri-02 from vm-petri-01. That test fails and indicates that there is an issue with a rule in a network security group. Somehow, traffic into vm-petri-02 is being blocked.
There are two ways that a network security group can be deployed or associated:
If we look at the topology that I generated earlier (at the top of the post) you will see that there are two network security groups (NSGs). A rule in one of these NSGs is the culprit:
Network Watcher has a tool called Security Group View. Instead of bouncing around the Azure Portal, we can see all NSG rules that are in effect on a virtual machine broken up into inbound and outbound.
You need to enter the following to configure the tool:
The results can be complex but you can download them for easier analysis. In the results, you can see the overall effective rules, the rules associated with the subnet, and the rules associated with the NIC.
Unfortunately, the effective rules view is a little confusing but you can make sense of it. Read the inbound rules from bottom to top. That is the path that an inbound packet will pass through. First, it hits the subnet’s NSG. You can tell when the rules switch from subnet to NIC because the last rules are always the default rules. Once you hit a set of default rules in the middle, as in this case, you are at the end of the NSG and about to move into the next NSG.
Let’s pretend to be an RDP packet trying to reach vm-petri-02. You come in and hit the subnet NSG (marked in blue). The user-defined rule, allow-rdp, is allowing TCP 3389 traffic, which is good. We have hit a green light in this NSG, which allows us to attempt to get into the NIC of vm-petri-02.
Now we hit the NSG associated with the NIC, marked in green. The first rule we hit, marked in red, is a user-defined rule called securityexpert-blockingall. That rule is denying all traffic going to all ports. This is where our beleaguered RDP packet dies and might explain why users cannot access the services on this machine anymore.
If you cannot find anything wrong with the NSGs, then it is time to start looking inside the guest OS. Some things to consider might be:
It is a good idea to get to know Azure Network Watcher when things are good because when things go wrong, the contained set of tools might help you get to the root cause very quickly. This is especially true if you already know how to use them.