Dynamic VMQ

After working in the hypervisor team for few years, during Windows 8 time frame, I decided to move back to networking as a lead, to lead the increased investments in networking. We built a lot of features such as SR-IOV support, Dynamic VMQ, Extensible Virtual Switch etc.

In this post I would talk about a feature we built called dynamic VMQ, a feature designed to provide¬†optimal processor utilization across changing workload that was not possible with static processor allocation for VMQ as done in Windows 7 (or Windows Server 2008 R2 release).¬†However, before we dig deeper into dynamic VMQ, let me recap the processor utilization for no VMQ and static VMQ cases. (more…)

Read More

Virtual Machine Queue (VMQ)

In my last post, I talked about how various NIC offloads are supported in VMSWITCH to provide high performance network device virtualization. In this post, I would talk about another networking performance technique called virtual machine queues (or VMQ).


In Windows networking stack, to utilize multiple processors in a machine, a feature called RSS (or receive side scaling) is used. This feature was co-developed by Microsoft working with hardware partners. It provides two main features:

  • It allows incoming traffic to be put on different queues that get processed on different processors based on TCP/UDP stream information i.e. source and destination IP and ports.
  • It allows sent traffic to be put on specific queues and completion for sent traffic to be handled on a specific processor based on the TCP/UDP stream information.


Read More

Virtual Switch Performance using Offloads

In my last post, I talked about the architecture of Hyper-V Virtual Switch (VMSWITCH), that powers some of the largest data centers in the world, including but not limited to Windows Azure. In this post I would talk about how it is able to meet the networking performance requirements of the demanding workloads that runs in these data centers.

VMSWITCH provides an extremely high performance packet processing pipeline by using various techniques such as lock free data path, using pre-allocated memory buffers, batch packet processing etc. In addition, it leverages the packet processing offloads provided by underlying physical NIC hardware. These offloads do some of the packet processing in NIC hardware, thereby reducing the overall CPU usage and providing a high performance networking. If you are unfamiliar with NIC offloads, you may want to first read about them here and here. (more…)

Read More