In early versions of Hyper-V, we used to start HV after NT kernel in the host is already loaded. In this case APIC was already initialized by NT and serving interrupts to NT kernel. Before we started HV, we disabled interrupts, copied APIC state to the virtual APIC and then started the hypervisor. From this point onwards, HV became owner of the physical APIC and NT kernel in host OS managed only virtual APIC state. In this case, HV needed to provide identity mapping for the interrupts as NT had already programmed various devices to generate specific interrupts and HV had no way to reprogram those devices. So even though hypervisor would intercept the interrupt, it would simply deliver the interrupt to NT. For its own use, HV relied upon NT to reserve interrupt vectors or use NMIs. For example, we used NMIs for inter-processor synchronization. HV kept track of whether NMI was requested by it or not, and if a phantom NMI was received, it was delivered to NT kernel for processing. This ensured that if NMIs are delivered due to a fault or other critical event that doesn’t belong to HV, NT received the NMI and handled it correctly.
In versions starting Windows 8 (or window server 2012), we added two key features, early launch and interrupt remapping. Early launch allowed us to start hypervisor even before NT kernel started. This provided HV complete control over how interrupts in the physical APIC are allocated. Interrupt remapping created a mechanism using which a mapping between physical interrupts and virtual interrupts was maintained. With interrupt remapping, HV took ownership of the physical APIC and provided a hypercall that allowed NT in the host to request mapping between virtual APIC vectors and physical APIC. The devices would be programmed with the remapped interrupt entries and HV did the routing of interrupts based on the remapping tables.
In my last post, I talked about interrupt virtualization in Hyper-V. In this post, I would talk about interrupt remapping and how that is used to provide safe device access to virtual machines.
IOMMU (I/O Memory Management Unit) is a hardware component that is designed to provide address translation for DMA requests (referred as DMA remapping). In addition, it also provides translation for interrupt requests for both interrupt requests that are originated via IOxAPIC and MSI. This is referred as interrupt remapping. In Hyper-V, we use both DMA and Interrupt remapping to provide safe device assignment to virtual machines. HV manages the translation table such that if a malicious VM programs the device to generate an invalid or spoofed DMA or interrupt request, that request is blocked.
There are other use cases where IOMMU can be used such as allowing devices that cannot access the full address range to DMA to any region of memory, but our focus in this discussion is only around interrupt remapping.
Interrupt remapping is a requirement to allow safe assignment of physical devices or virtual functions of physical SR-IOV devices to virtual machines. In a traditional system, without IOMMU or interrupt remapping, if you assign a physical device to a virtual machine, software inside virtual machine can program the device to spoof interrupt request and can become a mechanism to create a security attack on the hypervisor or the host partition (domain 0 in xen land).
In Hyper-V, design of interrupt remapping involves maintaining two levels of mapping. The first mapping is maintained in software and provides a mapping between interrupt vectors on physical APIC to interrupt vectors on vAPIC. This is called software interrupt remapping tables. The purpose of these is to allow HV to manage interrupts on the physical APIC and provide correct routing of those interrupt requests to the right vAPIC of vCPU of virtual Machine (or partition). The table below shows an example mapping that would be maintained by the hypervisor.
|Source Interrupt Vector
|Destination Partition Id
|Destination Interrupt Vector
Internally, a per CPU mapping table is maintained so that a direct index is used in the table to find the destination information and route interrupt request to the destination. Other parameters for the interrupt request (not shown in table) are also maintained such as whether interrupt is level or edge triggered etc.
Software interrupt remapping cannot guard against devices generating spoofed interrupt requests. For example, if a device is assigned to a virtual machine, the virtual machine may program the device to generate an interrupt request that causes NMI and causes the host to crash. To protect against such attacks, hardware interrupt remapping feature of IOMMU is used. This is referred as hardware interrupt remapping. With hardware interrupt remapping, the interrupt requests from devices are validated against a mapping table and only valid request from that device is allowed. HV programs the mapping table and ensures that for each device only valid interrupt requests are allowed. A conceptual mapping table for hardware remapping is shown below as an example as well.
|Source Device Id
|Source Interrupt Index
|Destination Interrupt Vector
The device Id is calculated from PCI bus, device and function number. Interrupt index is calculated either from IOAPIC RTE or MSI address/data pair that generated the interrupt. If a device generates an interrupt, that results in an index which is not mapped to that device, the interrupt request is dropped and a fault in IOMMU is recorded. This ensures that not only invalid interrupts are blocked, a trail is maintained so that a corrective action can be taken such as removing the device from the VM or taking VM offline etc.
I created a presentation few years back to describe the overall support for IOMMU in Hyper-V. You can go through that below for more details.