Life of Interrupts: Background

In this series of posts, I am going to talk about interrupt handling in a virtualized environment (specifically Hyper-V). This discussion would also include interrupt handling on systems that have IOMMU based interrupt remapping support. Interrupt remapping is required in Hyper-V for supporting SR-IOV enabled devices and device assignment to virtual machines.

The series is mostly written from software perspective and hardware details are only provided where necessary for understanding of the concepts. The series is focused on x86/x64 based architectures though the concepts described here can be applied to other architectures. (more…)

Read More

Dynamic VMQ

After working in the hypervisor team for few years, during Windows 8 time frame, I decided to move back to networking as a lead, to lead the increased investments in networking. We built a lot of features such as SR-IOV support, Dynamic VMQ, Extensible Virtual Switch etc.

In this post I would talk about a feature we built called dynamic VMQ, a feature designed to provide optimal processor utilization across changing workload that was not possible with static processor allocation for VMQ as done in Windows 7 (or Windows Server 2008 R2 release). However, before we dig deeper into dynamic VMQ, let me recap the processor utilization for no VMQ and static VMQ cases. (more…)

Read More

Virtual Machine Queue (VMQ)

In my last post, I talked about how various NIC offloads are supported in VMSWITCH to provide high performance network device virtualization. In this post, I would talk about another networking performance technique called virtual machine queues (or VMQ).


In Windows networking stack, to utilize multiple processors in a machine, a feature called RSS (or receive side scaling) is used. This feature was co-developed by Microsoft working with hardware partners. It provides two main features:

  • It allows incoming traffic to be put on different queues that get processed on different processors based on TCP/UDP stream information i.e. source and destination IP and ports.
  • It allows sent traffic to be put on specific queues and completion for sent traffic to be handled on a specific processor based on the TCP/UDP stream information.


Read More

Virtual Switch Performance using Offloads

In my last post, I talked about the architecture of Hyper-V Virtual Switch (VMSWITCH), that powers some of the largest data centers in the world, including but not limited to Windows Azure. In this post I would talk about how it is able to meet the networking performance requirements of the demanding workloads that runs in these data centers.

VMSWITCH provides an extremely high performance packet processing pipeline by using various techniques such as lock free data path, using pre-allocated memory buffers, batch packet processing etc. In addition, it leverages the packet processing offloads provided by underlying physical NIC hardware. These offloads do some of the packet processing in NIC hardware, thereby reducing the overall CPU usage and providing a high performance networking. If you are unfamiliar with NIC offloads, you may want to first read about them here and here. (more…)

Read More

Architecture of Hyper-V Virtual Switch

Hyper-V Virtual Switch (referred also as VMSWITCH) is the foundational component of network device virtualization in Hyper-V. It powers some of the largest data centers in the world, including Windows Azure. In this post, I would talk about its high level architecture.

Standards Compliant Virtual Switch

As I mentioned in my previous post, the main goal in building VMSWITCH was to build a standards compliant, high performance virtual switch. There are many ways to interpret standards compliant because there are many standards. So to be specific, our goal was to build packet forwarding based on 802.1q and mimic physical network semantics (such as link up/down) as much as possible. This was to make sure that whatever works in a physical network, works in the virtual environment as well. We defined three main objects in VMSWITCH, vSwitch, vPort and NIC. A vSwitch is an instance of a virtual switch that provides packet forwarding and various other features provided by a switch such as QoS, ACL etc. A vPort is analogous to a physical switch port and has configuration associated with it for various features. And finally, a NIC objects, that acts as the endpoint connecting to a vPort. This is similar to the physical network, where a host has a physical NIC that connects to a physical port on a physical switch. (more…)

Read More

Hyper-V Inception

Back in 2003, I joined the team that was venturing into the world of server virtualization, and little did I know that it would take me on journey, that remains as exciting today as it was back then. This was the time, when leaders in our team were contemplating building a brand new hypervisor based virtualization solution. Who knew, back then, that one day it would become the defining feature of our server and cloud solutions. I remember, new to the team, wondering what role I would play in this product, which sounded like rocket science at that time. There were intense architectural meetings, long discussions on finding a code name for the project and both excitement and nervousness to see what I would get to do. Eventually Viridian was born and I, along with one of my colleague Jeffrey, was given the charter to build network device virtualization for Viridian. (more…)

Read More

Windows Memory Management

[Moved an old article that I wrote in 2004 to my new blog]


Windows on 32 bit x86 systems can access up to 4GB of physical memory. This is due to the fact that the processor’s address bus which is 32 lines or 32 bits can only access address range from 0x00000000 to 0xFFFFFFFF which is 4GB. Windows also allows each process to have its own 4GB logical address space. The lower 2GB of this address space is available for the user mode process and upper 2GB is reserved for Windows Kernel mode code. How does Windows give 4GB address space each to multiple processes when the total memory it can access is also limited to 4GB. To achieve this Windows uses a feature of x86 processor (386 and above) known as paging. Paging allows the software to use a different memory address (known as logical address) than the physical memory address. The Processor’s paging unit translates this logical address to the physical address transparently. This allows every process in the system to have its own 4GB logical address space. To understand this in more details, let us first take a look at how the paging in x86 works. (more…)

Read More

Syntactical twists of C (*p != *p)

[Moved an old post from 2006 to my new blog]

Few days back one of my colleague asked me to debug a problem. She wrote a program and it was crashing in strcpy. I looked at the the code and it looked just fine to me. I thought lets debug it to see whats going on. I started the debug session, variables were pointing to the right data, the stack was fine and she was copying a fixed string to a big enough buffer. I stepped over strcpy and bammm…access violation. Weird huh…For a second i thought how can a simple code like this crash. It was time to dig into the disassembly to see what exactly is going on. But before we do that, lets take a look at two C functions below: (more…)

Read More

Calling conventions in Windows on x86

[Moved an old post from 2006 to my new blog]

It is 2 AM in the night and i don’t feel like sleeping so i thought why not i start my blog and here i am with my first blog entry ever.

People who do programming on Windows in C/C++, might wonder sometime, what is the __cdecl or __stdcall in front of a function declaration? These compiler specific prefixes are basically a way to tell the compiler, how to push the function arguments on the stack and how to pop them off the stack. These prefix defines the contract between Caller (the one who calls a function) and Callee (the called function) for argument passing. This contact is known as Calling convention. Usually we should need only one calling convention for argument passing but Windows compilers provide more than one convention because of historical and performance reasons. The three calling conventions available on windows are:

  1. __cdecl (more…)

Read More