13 March 2009

design tradeoffs in hardware virtualization

I mentioned that the Power ISA version 2.06 was published recently, which added a model for hardware virtualization on "Book E" embedded processors. (The model for hardware virtualization in server processors, such as POWER4, has been around for years.)

The only reason to add hardware support at all (for anything, not just virtualization) is to improve performance. You can do pretty much anything in software; it's just faster to do it in hardware. For example, people have run emulators and JVMs for years, and that gives you a virtual machine without hardware support. We've even demonstrated virtualization without hardware support with KVM on PowerPC 440.

So the goal for virtualization hardware support, to allow the guest kernel to access as much hardware state as possible without host intervention. In an ideal world we could just duplicate all processor state and allow the guest free access to its own copy... but hardware costs power, heat, and die space.

As a compromise, the Book E virtualization architecture duplicates only the state accessed in the fast path of normal guest operation. So there are guest-writable copies of registers like SRR0-1, SPRG0-3, ESR, and DEAR, which are heavily used by the guest in its interrupt vectors. However, registers which are only used for hardware initialization are not duplicated: when the guest tries to access these registers, a privilege fault occurs and the host/hypervisor emulates the instruction. Slow, but (hopefully) only for operations that don't need to be fast. Similarly, some interrupt vectors (such as alignment interrupts) are only delivered to the host, and at that point it is software's responsibility to implement interrupts delivery to the guest.

In contrast, the virtualization architecture for x86 doesn't duplicate register state, but rather provides instructions for atomically transferring a boatload of register state to and from memory. This definitely does not fit with the RISCish philosophy of the Power architecture, where memory accesses are performed only by load and store instructions. I'm not a hardware person, but I can guess that implementing the x86 technique is rather difficult... and I guess that's the whole point of CISC and RISC. :) I can say that I really appreciate the flexibility when hardware provides simple tools and software can use them how it likes.

Anyways, I can't judge one way as better than the other because I don't understand the hardware implications, but that's really the point I'm trying to make: implementing functionality like this is all about design tradeoffs between hardware and software.

No comments:

Post a Comment