In this post, I will explain the role of a new preview feature in Azure called Network Watcher. This will allow you to monitor your network deployments in Azure from both a single resource and a solution perspective.
The Problem
Everything about cloud computing screams connectivity. Once you start to get beyond the most basic of deployments, you will build quite a bit of complexity into your solutions. I have covered most of these solutions on Petri.com over the past few years. A few of the topics I have covered include:
Hybrid networking using gateways for VPN and/or ExpressRoute
VNet peering
Layer 4 load balancing using the Azure load balancer
Layer 7 load balancing using the web application gateway
Layer 4 security at the subnet or NIC using network security groups
Routing using user defined routing
With complexity comes problems. We need more than just ping and trace-route to figure out where those problems reside.
Network Watcher
Microsoft recently launched a public preview of Network Watcher. It is available in US West Central, US North Central, and US West. It also will be rolled out to more regions over time. Network Watcher provides us with network monitoring at two levels:
Scenario: This level is end-to-end monitoring of a solution. An example of this is being able to figure out why a packet cannot reach a destination from a given source.
Resource: In this level, we can retrieve diagnostic logs and metrics. We can also troubleshoot and resource health from a given resource.
Scenario Monitoring
If you have a problem that is hard to track down, then you will probably start your investigations using scenario monitoring. There are eight functions available to us:
Topology: You can view how network resources are connected together. At this time, the graphical view is limited to resources in the same resource group. The PowerShell method returns a JSON of all the objects and the resources that are contained within or connected to them.
Variable packet capture: This powerful feature allows you to capture packets as they enter or leave a virtual machine. This is useful for analysis, intrusion detection, performance monitoring, and more. You can selectively capture packets that meet certain criteria for a limited amount of time. This can also automatically trigger a capture after an alert.
IP flow verify: Security can be complex. Using IP flow verify, you can test the flow of traffic to a specific machine. You can also identify which network security group rule has blocked the traffic.
Next hop: User defined routing can be tricky. Next hop allows you to figure out why traffic is being routed to a black hole. The next hop from a machine to an address, if present, can be identified. You can also identify the rule.
Security group view: I have seen how people can get lost in network security groups. This feature shows you applied and effective rules.
NSG flow logging: You can view a log of ingress and egress traffic passing through a network security group on a per-rule basis.
Virtual network gateway and connection troubleshooting: We have not had a way to troubleshoot ARM-only deployments of the gateway, such as suCSP subscriptions, until now. We can retrieve logs to get the health of a VPN gateway and its connections.
Network subscription limits: While most of us will never care about this, larger customers will care. They will want an easy way to view how many of the available, per-subscription, network resources they can still deploy. This tool gives you a summary of the quantities of each resource type that is deployed versus the subscription limits.
Configuring diagnostics log: You get a single interface that allows you to manage and view diagnostics logging for every load balancer, network security group, route, and application gateway. These logs can also be used in Azure Analytics, PowerBI, a range of third-party tools, or open source tools.
Resource Monitoring
Once you have identified a resource that is causing concern, you will want to monitor it closely. This will help to solve the issues and to identify those that continue to occur. We can do the following:
Audit log: All changes are logged. You can track what changes were made, by whom, and when.
Metrics: At the moment, only the Application Gateway is capable of metrics monitoring. You can trigger alerts based on a pre-set threshold.
Diagnostic logs: As mentioned in scenario monitoring, logs can be gathered and viewed in solutions such as PowerBI and Log Analytics.
Troubleshooting: A troubleshooting blade allows you to find solutions for common problems with ExpressRoute, VPN Gateway, Application Gateway, Network Security Logs, Routes, DNS, Load Balancer, and Traffic Manager.
Resource health: The health of resources is updated in the portal on a periodic basis. At times, the cause of an issue is not necessarily a configuration fault on your part. There could be a failure in Azure.
Summary
Server administrators are going to be responsible for much of the networking that was not formerly in their scope. I have found myself doing all kinds of networking in Azure that I have never done with a physical switch, router or firewall before. We need tools to help us monitor and troubleshoot these architectures. They can quickly become quite hairy. In my opinion, this preview release of Network Watcher is a great start. This should be a service that you get to know.