Why Does Hyper-V Have Network Issues with 1 GbE NICs?
Microsoft has urged OEMs not to enable VMQ on the standard 1 GbE NIC that’s commonly found in Hype-V hosts. But despite this request and the fact that it adds nothing. VMQ is left enabled and causes performance and uptime issues. In this article, you’ll learn why you should disable VMQ as a standard part of your deployment and configuration management.
A Common Cause of Networking Issues
On social media, in meetings, at community events, and even after speaking at Microsoft Ignite, I get asked a question that starts something like, “My Hyper-V hosts have a problem when <insert something to do with networking> …,” and I interrupt them.
I ask if they are using Emulex 10 Gbps converged NICs, which is the sort you find in IBM, HP, and Hitachi blade servers, or 1 Gbps Ethernet NICs in their hosts. Emulex appears to have finally sorted out the awful handling of VMQ in their firmware and drivers, and OEMs eventually dribbled out the fixes.
But most of the time, the answer is that they have 1 GbE networking from Broadcom or Intel. I usually know straight away what the fix is. I ask them if they have disabled VMQ on the physical NICs that are used for the virtual switch. “VM-what?” is sometimes the response, and other times the response is “I don’t know.”
If you’re using 1 GbE NICs, then you probably shouldn’t know or care about VMQ, because this hardware offload offers nothing to you. You’re not pushing enough traffic into your hosts to take advantage of VMQ. Typically this offload makes it possible to take full advantage of 10 Gbps or faster networking.
Microsoft, aware that VMQ offers nothing, has asked OEMs not to enable the feature by default on 1 GbE NICs. However, the OEMs have not only ignored Microsoft, but they ignore VMQ-aware administrators, too. I’ve been told many times that those administrators that are aware of the VMQ guidance for 1 GbE NICs and disable VMQ per Microsoft’s guidance, often find it re-enabled after upgrading the driver from their server manufacturer.
What makes it worse it that the drivers and firmwares for these NICs usually handle VMQ very poorly. This leads to performance issues. For example:
- Performance issues: Here’s an example reported by SQL Server MVP and Ranger, Bob Duffy. He found he has slow SSAS connections with SQL installed in Hyper-V virtual machines.
- Network outages: I have heard many times how 1 GbE NICs are causing outages whenever a load is placed on them.
The cause? VMQ was enabled on these 1 GbE NICs.
As a short-term solution, nly enable VMQ on NICs where it is really required. Of course, the OEMs turn it on by default, so you’ll need to disable it in all of your deployments if you are not making use of it. VMQ offers nothing for 1 GbE connections other than bugs, so turn it off every time on every physical 1 GbE NIC on your Hyper-V hosts.
What about OEMs turning VMQ back on after you perform a driver update? The only solution here is to either waste time re-disabling it or to implement some kind of desired state configuration (DSC) management to automatically return VMQ to a disabled state on 1 GbE physical NICs.
For a long-term solution, I want Microsoft to do three things:
- Turn off all hardware optimizations by default in the OS/hypervisor. We should only use these features if we know what they are and understand their effects and risks.
- Create a super-HCL for virtualization. The current testing of hardware, drivers, and firmware is clearly insufficient. Many of the NICs I have heard are causing problems are certified by Microsoft for Windows Server 2012 R2. Obviously the testing is not checking virtualization feature stability and not checking Microsoft’s guidance on feature enablement.
- Kick offending products out of the HCL. Those vendors need Microsoft more than Microsoft needs them, and it is Windows Server and Hyper-V that get blamed when a virtual machine goes offline.