27 March 2009

Wind River Linux 3.0 adds KVM

Wind River recently released Wind River Linux 3.0, including KVM support (on x86 systems of course).

Wind River is better known for their VxWorks embedded RTOS, which traditionally has been one of the dominant operating systems in the embedded industry, and still is today. After criticizing Linux and the GPL (as VxWorks competition) for years, in 2003 the company gave in and started moving towards Linux support, including its own Linux distribution. Today Wind River Linux is appearing in more and more places in the embedded Linux market. I think it's considered #2 after MontaVista, though I admit I don't know the relative market shares there.

In some ways, KVM support in Wind River Linux isn't a big surprise, because we already know that Wind River believes in embedded virtualization so much they're writing their own hypervisor.

In other ways, it is a surprise, because KVM is a hypervisor too, and as such might compete with their own hypervisor. I suppose they will have lots of internal conversations about market positioning and how to convince the world they're not competing with themselves, but I guess every sufficiently large company has the exact same issue.

Anyways, the one big takeaway from all this is that Wind River seems to be saying that KVM is good enough for embedded systems. Since I've been saying the same thing for a while to a sometimes-skeptical audience, I'll take it. ;)

13 March 2009

design tradeoffs in hardware virtualization

I mentioned that the Power ISA version 2.06 was published recently, which added a model for hardware virtualization on "Book E" embedded processors. (The model for hardware virtualization in server processors, such as POWER4, has been around for years.)

The only reason to add hardware support at all (for anything, not just virtualization) is to improve performance. You can do pretty much anything in software; it's just faster to do it in hardware. For example, people have run emulators and JVMs for years, and that gives you a virtual machine without hardware support. We've even demonstrated virtualization without hardware support with KVM on PowerPC 440.

So the goal for virtualization hardware support, to allow the guest kernel to access as much hardware state as possible without host intervention. In an ideal world we could just duplicate all processor state and allow the guest free access to its own copy... but hardware costs power, heat, and die space.

As a compromise, the Book E virtualization architecture duplicates only the state accessed in the fast path of normal guest operation. So there are guest-writable copies of registers like SRR0-1, SPRG0-3, ESR, and DEAR, which are heavily used by the guest in its interrupt vectors. However, registers which are only used for hardware initialization are not duplicated: when the guest tries to access these registers, a privilege fault occurs and the host/hypervisor emulates the instruction. Slow, but (hopefully) only for operations that don't need to be fast. Similarly, some interrupt vectors (such as alignment interrupts) are only delivered to the host, and at that point it is software's responsibility to implement interrupts delivery to the guest.

In contrast, the virtualization architecture for x86 doesn't duplicate register state, but rather provides instructions for atomically transferring a boatload of register state to and from memory. This definitely does not fit with the RISCish philosophy of the Power architecture, where memory accesses are performed only by load and store instructions. I'm not a hardware person, but I can guess that implementing the x86 technique is rather difficult... and I guess that's the whole point of CISC and RISC. :) I can say that I really appreciate the flexibility when hardware provides simple tools and software can use them how it likes.

Anyways, I can't judge one way as better than the other because I don't understand the hardware implications, but that's really the point I'm trying to make: implementing functionality like this is all about design tradeoffs between hardware and software.

06 March 2009

VirtualLogix virtualization on Atom

Right now there are two embedded cores from Intel called "Atom": Silverthorne, which implements VT virtualization support, and Diamondville, which doesn't.

VirtualLogix just announced a port of VLX to Atom Z530, which is a Silverthorne core, though I have no firsthand knowledge if they use VT or not (too technical for a press release I guess). I would assume they do, since that's the only way to virtualize Windows, which they advertise elsewhere on their site.

Interestingly, Intel reported at KVM Forum 2008 that they had run Xen and KVM on Atom (Silverthorne) without problem. (I guess that's the value of a common instruction set...) The biggest issue they faced was at the system level: some Atom systems just don't have enough RAM to run more than one or two guests.

05 March 2009

real-time hypervisors

Chuck Yoo from Korea University presented at the Xen Summit last week about Real-time Xen for Embedded Devices (a video of the presentation is also available). He seems to be particularly interested in mobile phones, so his motivation was running the radio stack separate from the user interface.

One of his observations is that interrupts interfere with predictability. To mitigate this, one can disable interrupts completely (poll), or defer interrupt handling so it occurs mostly in schedulable tasks (that way the RT scheduler can prioritize RT tasks over interrupt handling).

I guess that makes sense if your primary objective is scheduling. I'm not really a real-time person, but the real-time metric I hear about the most is "interrupt latency" -- the time between interrupt delivery and running the interrupt handler (lower is better). In this regard, the fact that Xen has no "hypervisor threads" is terrible, because it means that a Xen code path cannot be paused due to interrupt.

In contrast, because KVM is based on the Linux kernel, which can preempt kernel threads (to defer interrupt handlers), this is not an issue.

On the subject of scheduling determinism, Chuck makes the key point that the host scheduler must also have visibility into the real-time characteristics of guest tasks, and he suggests "aggregation" as a way for the host scheduler to account for real-time tasks in multiple guests.

Chuck later observes that Xen's IO model (asynchronous non-deterministic communication with dom0) is also a huge obstacle. As a workaround, he proposes giving RT domains direct access to dedicated hardware.

In contrast, KVM schedules guests and qemu (which provides the virtual IO services) as a single unit. When the guest needs IO services, that means qemu is "already running", and the scheduler doesn't get involved. (I will admit things are becoming a little more fuzzy with qemu's adoption of IO threads, but qemu is shared between Xen and KVM anyways.)

At any rate, a very interesting presentation, though I believe that by starting with Xen, Chuck and his students are needlessly handicapping themselves. Unfortunately I think many embedded systems people are lured to Xen by the promise of a "tiny hypervisor kernel," without realizing the depth of dependencies on the Linux dom0...