Interrupt handling is one of the critical subsystem in the hypervisor (HV). It is critical both from correctness and performance perspective. Hyper-V supports multi-processor systems and uses many optimizations to improve performance for interrupt virtualization.
Interrupt handling is done via APIC emulation. For each vCPU, a vAPIC is created in the hypervisor. These vAPIC behaves similar to APIC in a physical system and provide interrupt support to virtual machines. To support operating systems that are not APIC aware, PIC emulation is provided using virtual wire mode as specified in Intel multiprocessor specifications.
In this post, I would talk about how interrupts are virtualized in Hyper-V environment and discuss some of the performance optimizations.
Before we discuss PIC and APIC emulation, I would describe the intercept and event handling in Hyper-V as that would make it easier to understanding other topics.
Hyper-V relies upon hardware assisted virtualization, namely VMX from Intel and SVM from AMD. The hardware assisted virtualization provides many required features to make virtualization efficient and easier to implement. The two key features that are important for this discussion as intercept and event injection. Intercepts are mechanisms by which a physical CPU which is executing a virtual machine code can trap into hypervisor (also referred as VM exits). This is important to allow hypervisor to take control of handling of certain critical operations. Let us take an example of inter-processor interrupts. Consider a VM that is running with 2 vCPUs and one vCPU sends an interrupt to another vCPU. In this case, the other vCPU may not actually be running (context switched out due to CPU quota or is idle i.e. executed a HLT instruction). The interrupt request is intercepted by the hypervisor. Hypervisor first sets the event injection information in the target vCPU and then changes its state to runnable. Once the target vCPU runs, the interrupt is issued on the target vCPU. Event injection is a mechanism that is provided by hardware to issue interrupts and exceptions on a vCPU. It works by setting the hardware specific fields for the vCPU and then making that vCPU run on a selected physical CPU. Using these mechanisms, hypervisor can intercept critical operations, emulate them and send interrupts to vCPUs.
Hypervisor emulates 8259 PIC and a local APIC. PIC uses I/O ports for programming while APIC uses MMIO for programming. For PIC emulation, hypervisor installs intercepts for I/O port operations that are used for PIC. Upon such intercept, hypervisor first checks the port number, if that port number belongs to PIC I/O port, that intercept is sent to a user mode component (that is running in root partition or host OS) for emulation. The emulation code does instruction decoding to determine the exact operation that is being executed and then emulates that by manipulating the virtual PIC state.
For APIC emulation, hypervisor uses a page fault intercept for the APIC MMIO page. As most modern OS support APIC and use APIC as primary mechanism for handling interrupts, this is handled completely inside hypervisor for performance reasons (as opposed to using a user mode process in host OS). Hypervisor detects that the page fault is for APIC page, it then does instruction decoding and emulates the requested instruction by manipulating the virtual APIC state. For example, if a processor requested a self IPI i.e. interrupt to itself, in this case hypervisor would intercept this request, set the required bits in the vAPIC and the processor would receive the interrupt via event injection when possible (i.e. based on vAPIC state, current interrupt priority level etc.).
The process of instruction decoding to emulate APIC, while higher performance than PIC emulation, is still cumbersome. A set of optimizations were built to further improve interrupt virtualization performance.
- Synthetic MSR – A set of virtual MSRs were defined to allow an enlightened OS (an OS that is aware of the fact that it is running over a Hyper-V hypervisor) to send interrupt requests by writing to an MSR. This removed the overhead of doing instruction decoding for vAPIC and provided a faster way for VMs to request interrupts. It is much faster for HV to check the MSR index being written to and carry out the requested operation, instead of doing complex instruction decoding.
- Auto EOI – Each interrupt request needs to be concluded by an OS using a command called EOI (end of interrupt) to allow lower priority interrupts to be delivered. This request is achieved by writing to the EOI register in the APIC. Effectively you need as many EOI as the number of interrupts, thus requiring intercepts at 2x the rate of interrupt into the hypervisor. Auto EOI was an optimization that could be used by an enlightened OS for specific interrupts that gets EOI’ed as soon as the interrupt is delivered to the VM.
- Lazy EOI – Auto EOI is useful where you can make the ISR aware (and work well) if the interrupt is EOI’ed in APIC automatically. This is not always possible because the source code for interrupt handler may not be available or is not possible to be modify it. For example, interrupt handlers that are written for physical devices or emulated devices as the source code is generally not available. Lazy EOI solves this problem by creating a shared bit between the vCPU and hypervisor. Hypervisor sets this bit when it injects an interrupt on the vCPU in the VM and there are no other lower priority interrupt requests that needs to be processed. Hypervisor clears the bit if there are lower priority interrupts because then hypervisor needs an intercept (upon EOI) to deliver those interrupts. Interrupt handler in the VM does a bit-test-and-reset on the bit and if the bit was set initially, it knows that hypervisor doesn’t need to intercept the EOI and skips the EOI. This reduced the number of intercepts in the hypervisor significantly as well.