Windows Server and Hyper-V include support for a number of hardware offloads to improve the performance of networking. These offloads reduce the resource consumption of a server, host, or even a virtual machine, and they make it possible for these computers to scale up their workloads. In this article I will introduce the significant hardware offloads and explain what they offer. You can then use this information to steer your host or server designs.
Windows Server and Hyper-V can function without networking hardware offloads. However, you get to a point where as you scale up your workloads, the requirements of growing services place an increased demand on the hardware. This can reduce performance and even scalability.
For example, you might wish to use very dense Hyper-V hosts because your cost-of-ownership calculations determine that this is the best way forward. This means that there will be a huge flow of networking traffic into the host in question. Without assistance, a significant percentage of the host’s resources will be consumed by that networking traffic, thus reducing the virtual machine density that is possible on the host.
The normal flow of traffic through a Type 1 hypervisor such as Hyper-V or vSphere introduces a tiny amount of latency. You might have services that require network latency, lower than what an unassisted hypervisor can offer. Once again, without an offload, you might have no choice but to deploy these services on expensive, inflexible, difficult-to-manage, and non-cloud physical servers.
Hardware offloads for networking can improve performance of services and reduce the resource utilization of the physical servers that they are enabled on. This can increase virtualization density, positively impact service responsiveness, and make virtualization acceptable in some niche cases.
You cannot just blindly install and use hardware offloads. Some offloads will require very specialized hardware, while others have very limited support. Also, there may be compatibility issues between closely related features.
RDMA is actually an old hardware offload that was mostly forgotten about until the initial announcement of Windows Server 8, which later was released as Windows Server 2012 (WS2012). The purpose of RDMA is to allow a server to process huge amounts of traffic while minimizing physical processor utlization. The traffic bypasses much of the networking stack in Windows Server and is processed by a NIC that offers RDMA support. In fact, the NIC probably has a processor that looks like something from the motherboard of an older PC.
RDMA is supported by WS2012 and Windows Server 2012 R2 (WS2012 R2) on the following kinds of NIC:
RDMA is used by WS2012 and later to accelerate SMB 3.0 traffic. This was originally used just for storage traffic, but it is also used for:
Most people won’t know this, but the processing of inbound networking in a machine is limited to core 0 (not logical processor 0) of processor 0 in that machine. This can lead to the following happening:
Processor utilization with and without RSS.
With RSS enabled, a base processor is selected. This is done automatically by Windows, but you can override the configuration using PowerShell. This base processor could be core 1, or logical processor 2 if Hyperthreading is enabled. RSS will then use this core or logical processor as the starting point for processing inbound traffic. If the loads increase then RSS will scale the workload out across available cores or logical processors, thus increasing the capacity of the server/services without the limitations that were otherwise imposed by a single core.
One of the services that makes use of RSS is SMB 3.0; this gives us one of the features of SMB Multichannel where a single NIC can have multiple parallel threads of communication, thus making better use of the bandwidth that is available.
RSS is considered an offload for physical networking; in other words, it enhances the scalability of non-virtual NIC traffic. dVMQ is the cousin of RSS – it enhances the scalability of traffic into a host that is destined for virtual NICs. Without dVMQ, core 0 is once again the bottleneck. With dVMQ enabled a host is capable of handling much more traffic that is destined to a virtual NIC.
Processor utilization with and without dVMQ.
dVMQ and RSS actually use the same queues or circuitry on a NIC. That means that you cannot use dVMQ and RSS at the same time on a single NIC. You will notice in the two illustrated examples that I have shown two pairs of NICs in a Hyper-V host. The first pair of NICs has RSS enabled and is used to accelerate SMB traffic to the host from an SMB 3.0 file server. The second pair of NICs is used primarily for the virtual machines; dVMQ is enabled to increase host networking capacity.
You need to keep the RSS versus dVMQ balance in mind when designing Hyper-V converged networks hosts, especially when you are restricted to two NICs. In that case, you will use dVMQ (the traffic passes through a virtual switch) and you will not get the benefits of RSS for non-virtual traffic.
WS2012 R2 adds a new feature that brings the benefits of RSS to the guest operating system of virtual machines. With dVMQ enabled, you can enable vRSS in the advanced settings of a virtual NIC in the guest OS of a multi-virtual processor virtual machine. This will allow the processing of inbound traffic to that virtual machine to scale beyond virtual CPU 0 in the guest OS, and therefore allow the networking services of that virtual machine to scale out beyond the limitations of a single logical processor in the host.
To use vRSS the virtual machine must route traffic through a virtual switch that is connected to one or more dVMQ enabled physical NICs.
Virtual processor utilization with and without vRSS.
There is widespread support by NIC vendors for RSS and DVMQ. Consult manufacturer documentation for implementation guidance.
Some organizations may have a requirement to enable encryption of network traffic. IPsec allows Windows administrators to define policies to encrypt traffic based on predefined rules. This can be useful for regulatory compliance or for secrecy in a public or hosted private cloud. The problem with IPsec is that it uses physical processor resources. This sacrifice might have been acceptable on traditional physical servers, but with ever-increasing virtual machine density, a significant percentage of capacity could be lost to encryption and decryption computation.
WS2012 Hyper-V added support for IPsecTO. This allows a compatible NIC to perform the encryption and decryption processing on behalf of the host. This would reduce the demands on the processor made by IPsec rules within the guest OSs of the virtual machines, making that now-unused capacity available to other services or additional virtual machines.
Note that at this time very few NICs offer support for IPsecTO.
A networking packet makes a “long” journey when it travels from the physical switch into a host on the way to a virtual machine:
At first this might sound like quite a road trip, but every type 1 hypervisor has something like this, and the latency created for the packet isn’t actually that much. Most networking services will never have a problem. However, this form of networking does have an impact.
SR-IOV, added in WS2012 Hyper-V, does something strange in virtualization:
SR-IOV generated quite a bit of talk at the launch of WS2012, but it is a niche feature. In fact, some server manufacturers only support this feature in their top-end models. In reality, very few organizations will have a legitimate need to use this hardware offload.