07 October 2009

ARM virtualization issue

I will be the first to admit I don't know a lot about the ARM instruction set. However, I do know that instructions which behave differently at different privilege levels (and don't cause an interrupt) are a huge problem for virtualization.

The well-known "Formal Requirements for Virtualizable Third Generation Architectures" paper identified 17 problematic instructions on x86, the best example being POPF. The Intel Architecture Developer's Manual makes this understated observation:
The effect of the POPF/POPFD instructions on the EFLAGS register changes slightly, depending on the mode of operation of the processor.
In other words, you won't get a trap when trying to modify supervisor state with the POPF instruction in user mode.

In the PowerPC KVM implementation, we relied on the fact that a privileged instruction would trap. This enabled us to execute the vast majority of guest kernel instructions natively in user mode, since we would get a trap and could emulate any supervisor-only instructions. Ultimately, even without hardware support, we didn't need a complicated dynamic instruction translation engine (see VMware). Hardware support became a question of acceleration, rather than a requirement.

A colleague recently mentioned that ARM has a similar problem with the CPS instruction. Sure enough, from the ARM Architecture Reference Manual:
Exceptions: None.

Notes
User mode: CPS has no effect in User mode.
That's disappointing, because I had assumed that ARM, following similar RISCish principles to PowerPC, would have ended up with the same behavior. It took Intel years to add the necessary architecture changes for virtualization (VMX), and there is still no solution other than VMware's for the non-VMX processors.

From what I can tell, ARM TrustZone doesn't solve this problem... can anybody confirm?

19 June 2009

Solaris: the awesomest virtualization ever?

There's an advocacy piece, in Forbes of all places, that offers a pretty unbalanced perspective on Solaris. (I hesitate to even link to it, because it feels like sensationalism just to generate huge numbers of outraged readers.) The author, Dan Woods, doesn't mention any negative points to Solaris at all, which should raise the suspicions of any reader with critical thinking skills, but I wanted to debunk some of the virtualization statements in particular:

Dan claims that Linux virtualization is the result of un-coordinated development from a number of companies, and Solaris virtualization is better because it's engineered "top to bottom" at a single company. Seamless integration can certainly offer advantages (see Apple), but I take issue with both his observations about the ecosystem and the conclusions he draws from them.

Aside from containers, Solaris uses hypervisors from Xen (marketed as xVM), and VirtualBox (from the innotek acquisition). Neither of those solutions were designed for Solaris; they were adopted years later to fill gaps in Sun's offerings. However, they are currently developed by Sun, so you still have the "single company" argument. About that...

Where I come from, being completely dependent on a single company is a bad thing, and I'm not even talking about the freedom of open source. It's called "vendor lock-in," and it's bad because there's no competition and customers are at the mercy of that single company's roadmap. Companies invest lots of money developing and supporting 3rd party ecosystems because it's a critically important to their customers. Anyways, looking at it from another angle, isn't it disturbing that virtualization ISVs don't consider Solaris important enough to target? Sun had to buy them outright or develop solutions in-house.

Dan claims Solaris containers cause a 2% performance degradation, vs. "about 20% for a hypervisor." While it's true that Forbes isn't a good forum for presenting performance analyses, without even a hint about where they came from, offering these numbers is ridiculous. It's often true that you can pick a specific benchmark and environment to support any argument, but Dan didn't even pretend.

Finally, I thought Dan's most interesting claim was the one for which he didn't offer any supporting arguments at all: that Solaris is now safe. Even if he's right, and Solaris is indeed the most awesome OS ever seen, that still doesn't guarantee it a slot on the Oracle roadmap.

16 June 2009

Wind River Hypervisor release

Yesterday Wind River Systems released the hypervisor they announced last summer, complete with YouTube videos:
Things I find interesting:
  • From the demo video, we can see that the x86 port uses VT, but the PowerPC core doesn't have equivalent functionality (e500v2 is pre-ISA 2.06). That must mean they've got a paravirtual interface, likely the "virtual board interface" mentioned.
  • EETimes quote: "We map I/O into the guest environment directly so it can directly talk to data registers to get higher performance." Given that they're releasing VxWorks MILS Platform 2.0 today for the hypervisor, which presumably requires high levels of isolation, I'm willing to bet that direct IO access is an option rather than a design assumption.
  • Mark's demo videos had an emphasis on their Workbench IDE. I don't know if this was a conscious decision or not, but it does nicely reinforce the notion of the hypervisor as part of a solution, not a standalone product.
  • They advertise their "MIPC" shared-memory inter-guest communication mechansim. I hope they just put a fancy name on virtio, but I doubt it. If they aren't using virtio, they're developing yet another set of virtual IO drivers for Linux. :(
  • Their Linux patch isn't mentioned, but I hope they will be publishing it and merging it upstream, instead of the usual RTOS vendor approach to Linux...
  • They consistently advertise the performance impact as "2-3% at worst." That's a nice marketing bullet, but is never the right answer to a vague performance question. The right answer is "it depends." In this case, it depends on the processor model, the workload, the partitioning configuration, and more. For example, VT-enabled x86 virtualization will have very different performance characteristics than paravirtualized PowerPC.
All in all, it's a significant event when a dominant embedded software company enters a new area like virtualization. As time goes by, it will be interesting to see how well they play with competitors' software... who is going to port Integrity to the Wind River virtual board interface? ;)

29 May 2009

VMs on netbooks

OK, this post is about virtual machines in the JVM sense, not in the hypervisor sense. (The lines are getting a little blurry these days though.)

Back in the day, I once installed Windows NT on an RS/6000 (PowerPC) just to play with it. (It's funny how obsolete/impractical technology seemed so interesting back then, and these days it boggles my mind how anybody could care about Haiku, Amiga, etc.) So Windows NT: it installed OK, I started it up, and ran IE 2.0. That sucked (even at the time), but there were no updates for it. I ran the bundled Pinball game. That was the end of the story, because there was no ISV support. Just porting the kernel wasn't enough: an x86 WinNT application still couldn't run on PowerPC WinNT. The same fate could befall ARM netbooks (ahem "smartbooks").

This post suggests that .NET could be the answer. It starts by assuming that ARM netbooks will be common (a question on which the jury is still out), and then assumes Microsoft will somehow want to participate in that ecosystem (probably a safe bet: look at Windows ME on phones). Port some Windows kernel to ARM netbooks, provide a .NET runtime, and then just run .NET applications -- never worry about needing native ARM Windows binaries.

That has an existing parallel of course: J2ME on mobile phones. As a consumer, I'd call that a success. I love that I can download a random Java application and not worry about if the creator has built it for N phone vendors x M models. I'm sure J2ME has its limits, but it has made my life better.

And of course Google is walking down the same path with Dalvik. The cool thing about Java/Dalvik/.NET that it might just allow another processor architecture to compete with Intel without the legacy software issue. It will be interesting to see if Intel eventually enables Java on Moblin.

With Intel investing so many resources not just into the Linux kernel, but now into a full mobile Linux distribution (complete with UI), maybe Microsoft will annoy them right back by enabling ARM netbooks. You know both of them have to be looking to embedded systems for growth.

Anyways, I'd only buy a netbook that runs Linux. ;) I've heard good things about the ARM instruction set...

15 May 2009

Oracle buys another hypervisor?!

Oracle made headlines for two acqusitions in the past month, Sun and Virtual Iron. By my count, that makes them the proud owners of no fewer than four x86 hypervisors: Oracle VM, Sun's xVM Server, Sun's xVM Desktop (a.k.a. VirtualBox), and Virtual Iron. (I've never really understood why Sun had two.) All but VirtualBox are based on Xen.

Even with this surprising success in the game of "collect the Xen implementations," that still leaves at least Red Hat, Novell, and of course Citrix itself offering competing Xen solutions. I'll admit the Unix wars predate me, but that seems like an impressive degree of fragmentation. Still, Red Hat has announced plans to abandon Xen for KVM, and even Novell included KVM as a tech preview in SLES11. There's no question that the number of Xen-based products is about to significantly shrink.

I've even seen one person speculate that the cost of maintaining Xen is so high, that with Red Hat pulling out, Oracle must have been worried about strengthening their Xen development capabilities.

What's worse though is that each of those Oracle hypervisors has its own management stack. Systems management is one of those areas that sounds really easy, but in practice never is. Management software contains so many layers that it's hard to find anybody who actually understands the end-to-end details of a particular code path. You need translation layers for components that are too old, too new, or fit at a different layer, or written for a different management stack. In this case, you might find one management stack built on xm, another built on libvirt, another built on CIM (and "enterprise frameworks" are a whole other world of complexity). Do they use OVF for image files? Should they? Every design question has tradeoffs and requires serious consideration.

Speaking from experience at a large company, I expect there will be at least 6 months of churn while architects furiously scribe presentations, rearrange block diagrams, create hypothetical development sizings, establish migration plans for legacy customers, escalate issues to management (which will be sorting itself out too), find and get input from related organizations ("how does this affect our relationship with VMware?"), and in general figure out what they're doing. After all the dust has settled, they'll still need to write the code.

Will the eventual result of this consolidation be a stronger Xen ecosystem a few years from now? To be honest, I couldn't care less... but it could be worth the cost of a bag of popcorn.

30 April 2009

WAN optimization

I've wondered how "WAN optimization" magic works, and I just came across a page to explain it in (a little) more detail than marketing. I hadn't heard of it until a couple years ago, when some Cisco people mentioned that their WAASrouters embed KVM for virtualization.

Why would they do that? Because if your branch office uses an Active Directory server in your centralized data center, and your WAN link dies, work at the branch office ceases. From what I understand, Cisco's WAAS routers run an Active Directory server inside a virtual machine on the router itself, to mitigate that problem. A little googling reveals that similar approaches may be taken by their competition, 3Com and Riverbed.

In general, I expect we'll see much more virtualization in this area in the future. For example, today Cisco's
Application Extension Platform (AXP) products are physical x86 cards you stick into a router to run server workloads. It would be plain silly not to take advantage of the well-known consolidation benefits of virtualization to accomplish the same thing. (That's pure speculation, but as I said... silly.)

20 April 2009

package management and embedded Linux distributions

A while back, when it looked like the Kuro Box would actually go somewhere, I bought the original model. Among other things, this consists of a 200MHz Freescale e300 core (similar to PowerPC 603e) and 64MB of RAM. No serial port even, but through u-boot it has netconsole and netboot support. I had a decrepit version of Debian installed, and Debian's package management tools frustrate me to the point that I was helpless to get the thing upgraded (something about packages mysteriously "being held back").

For a system like this, the real value of a Linux distribution is the frequency of its security updates. There are some embedded distributions I could have messed with, but I have no idea how reliable their updates are, and I really don't want to be responsible for that myself. I started looking for a "normal" Linux distribution to install. (Unfortunately, the number of mainstream Linux distributions with PowerPC support are shrinking...)

I'm really a Fedora guy. Back in university I did some packaging for LinuxPPC and Yellow Dog Linux (both of which were Red Hat variants), and I'm comfortable enough with RPM to bootstrap a system from just about nothing. So I tried the Fedora installer, and when anaconda failed miserably with netconsole, I painfully installed it package by package.

Long story short, yum (the Fedora software installer) requires an absurd amount of RAM... way more than the 64MB my little Kuro has. The simplest yum operation was taking hours, and I could see I was well into swap. So after all that work, the box was still useless. I ran out of play time, unplugged the thing (since I couldn't install any security updates for it), and there it has sat for another 6 months.

Today I ran across a blog post comparing the speed and memory consumption of yum and zypper (OpenSUSE's package software installer). I don't know much about OpenSUSE, but I will be trying it next...

15 April 2009

leveraging Linux for virtualization: the dark side

I work on KVM, which is a relatively small kernel module that transforms the Linux kernel into a hypervisor. A hypervisor really is a kernel: it contains a scheduler (at least the good ones do ;), device drivers (at least interrupt controller, probably console, maybe more), memory management, interrupt handlers, bootstrap code, etc.

This is the key observation behind KVM's design. "Hmm, we need a kernel... and hey, we've already got one!" We just need to add some code to make it schedule kernels instead of userspace tasks. In fact, one of the major technical faults of the Xen project was that it needed to duplicate — often copy outright — Linux code, for features such as power management, NUMA support, an ACPI interpreter, PIC drivers, etc. By integrating with Linux, KVM gets all that for free.

There is a drawback to leveraging Linux though.

procon
use Linux's schedulerstuck with Linux's scheduler
use Linux's large page supportstuck with Linux's large page support
get lots of fancy Linux featuresstuck with the footprint of Linux's fancy features

Seeing a theme here? Let me share a little anecdote:

My team had been doing early development on KVM for PowerPC 440, and we were scheduled to do a demo at the Power.org Developer's Conference back in 2007. Unfortunately we weren't able to get Linux booting as a guest in time, but we had a simple standalone application we used instead. So when I say "early development" I mean "barely working."

A friend of mine walked up to the demo station and asked "Does nice work?" Now remember, basic functionality was missing. We couldn't even boot Linux. The only IO was a serial console. We had never touched a line of scheduler code, and certainly hadn't tested scheduling priorities. Despite all that, nice just worked because we were leveraging the Linux scheduler.

There's a down-side though. The Linux scheduler is famously tricky, and almost nobody wants to touch it because even slight tweaks can cause disastrous regressions for other workloads. The Linux scheduler does not support gang scheduling, where all threads of a particular task must be scheduled at once (or not at all).

Gang scheduling is very interesting for SMP guests using spinlocks. One virtual CPU could take a spinlock and then be de-scheduled by the host. Unaware of this important information, all the other virtual CPUs could spin waiting for the lock to be released, resulting in a lot of wasted CPU time. Gang scheduling is one way to avoid this problem by scheduling all virtual CPUs at once.

Since Linux doesn't support gang scheduling, and only a handful of people in the world have the technical skill and reputation to change that, that's basically a closed door.

This is just one example, but I think you can see that re-purposing Linux for virtualization is a tradeoff between functionality and control. If one were to write a new scheduler for a hypervisor, they'd need to implement nice themselves... but they would also be free to implement gang scheduling.

27 March 2009

Wind River Linux 3.0 adds KVM

Wind River recently released Wind River Linux 3.0, including KVM support (on x86 systems of course).

Wind River is better known for their VxWorks embedded RTOS, which traditionally has been one of the dominant operating systems in the embedded industry, and still is today. After criticizing Linux and the GPL (as VxWorks competition) for years, in 2003 the company gave in and started moving towards Linux support, including its own Linux distribution. Today Wind River Linux is appearing in more and more places in the embedded Linux market. I think it's considered #2 after MontaVista, though I admit I don't know the relative market shares there.

In some ways, KVM support in Wind River Linux isn't a big surprise, because we already know that Wind River believes in embedded virtualization so much they're writing their own hypervisor.

In other ways, it is a surprise, because KVM is a hypervisor too, and as such might compete with their own hypervisor. I suppose they will have lots of internal conversations about market positioning and how to convince the world they're not competing with themselves, but I guess every sufficiently large company has the exact same issue.

Anyways, the one big takeaway from all this is that Wind River seems to be saying that KVM is good enough for embedded systems. Since I've been saying the same thing for a while to a sometimes-skeptical audience, I'll take it. ;)

13 March 2009

design tradeoffs in hardware virtualization

I mentioned that the Power ISA version 2.06 was published recently, which added a model for hardware virtualization on "Book E" embedded processors. (The model for hardware virtualization in server processors, such as POWER4, has been around for years.)

The only reason to add hardware support at all (for anything, not just virtualization) is to improve performance. You can do pretty much anything in software; it's just faster to do it in hardware. For example, people have run emulators and JVMs for years, and that gives you a virtual machine without hardware support. We've even demonstrated virtualization without hardware support with KVM on PowerPC 440.

So the goal for virtualization hardware support, to allow the guest kernel to access as much hardware state as possible without host intervention. In an ideal world we could just duplicate all processor state and allow the guest free access to its own copy... but hardware costs power, heat, and die space.

As a compromise, the Book E virtualization architecture duplicates only the state accessed in the fast path of normal guest operation. So there are guest-writable copies of registers like SRR0-1, SPRG0-3, ESR, and DEAR, which are heavily used by the guest in its interrupt vectors. However, registers which are only used for hardware initialization are not duplicated: when the guest tries to access these registers, a privilege fault occurs and the host/hypervisor emulates the instruction. Slow, but (hopefully) only for operations that don't need to be fast. Similarly, some interrupt vectors (such as alignment interrupts) are only delivered to the host, and at that point it is software's responsibility to implement interrupts delivery to the guest.

In contrast, the virtualization architecture for x86 doesn't duplicate register state, but rather provides instructions for atomically transferring a boatload of register state to and from memory. This definitely does not fit with the RISCish philosophy of the Power architecture, where memory accesses are performed only by load and store instructions. I'm not a hardware person, but I can guess that implementing the x86 technique is rather difficult... and I guess that's the whole point of CISC and RISC. :) I can say that I really appreciate the flexibility when hardware provides simple tools and software can use them how it likes.

Anyways, I can't judge one way as better than the other because I don't understand the hardware implications, but that's really the point I'm trying to make: implementing functionality like this is all about design tradeoffs between hardware and software.

06 March 2009

VirtualLogix virtualization on Atom

Right now there are two embedded cores from Intel called "Atom": Silverthorne, which implements VT virtualization support, and Diamondville, which doesn't.

VirtualLogix just announced a port of VLX to Atom Z530, which is a Silverthorne core, though I have no firsthand knowledge if they use VT or not (too technical for a press release I guess). I would assume they do, since that's the only way to virtualize Windows, which they advertise elsewhere on their site.

Interestingly, Intel reported at KVM Forum 2008 that they had run Xen and KVM on Atom (Silverthorne) without problem. (I guess that's the value of a common instruction set...) The biggest issue they faced was at the system level: some Atom systems just don't have enough RAM to run more than one or two guests.

05 March 2009

real-time hypervisors

Chuck Yoo from Korea University presented at the Xen Summit last week about Real-time Xen for Embedded Devices (a video of the presentation is also available). He seems to be particularly interested in mobile phones, so his motivation was running the radio stack separate from the user interface.

One of his observations is that interrupts interfere with predictability. To mitigate this, one can disable interrupts completely (poll), or defer interrupt handling so it occurs mostly in schedulable tasks (that way the RT scheduler can prioritize RT tasks over interrupt handling).

I guess that makes sense if your primary objective is scheduling. I'm not really a real-time person, but the real-time metric I hear about the most is "interrupt latency" -- the time between interrupt delivery and running the interrupt handler (lower is better). In this regard, the fact that Xen has no "hypervisor threads" is terrible, because it means that a Xen code path cannot be paused due to interrupt.

In contrast, because KVM is based on the Linux kernel, which can preempt kernel threads (to defer interrupt handlers), this is not an issue.

On the subject of scheduling determinism, Chuck makes the key point that the host scheduler must also have visibility into the real-time characteristics of guest tasks, and he suggests "aggregation" as a way for the host scheduler to account for real-time tasks in multiple guests.

Chuck later observes that Xen's IO model (asynchronous non-deterministic communication with dom0) is also a huge obstacle. As a workaround, he proposes giving RT domains direct access to dedicated hardware.

In contrast, KVM schedules guests and qemu (which provides the virtual IO services) as a single unit. When the guest needs IO services, that means qemu is "already running", and the scheduler doesn't get involved. (I will admit things are becoming a little more fuzzy with qemu's adoption of IO threads, but qemu is shared between Xen and KVM anyways.)

At any rate, a very interesting presentation, though I believe that by starting with Xen, Chuck and his students are needlessly handicapping themselves. Unfortunately I think many embedded systems people are lured to Xen by the promise of a "tiny hypervisor kernel," without realizing the depth of dependencies on the Linux dom0...

27 February 2009

embedded VMware on the Nokia n800

Last year VMware acquired a French company called Trango, who had a tiny hypervisor written for ARM and MIPS. It's completely paravirtualization, i.e. modifications to the guest kernels that run on top of it (including Linux, though no patches are publicly available that I've found). (Mind you, it's only a thin hypervisor, but that probably makes sense for these low-end processors.)

So that's where this WinCE+Android demo comes from. It's cool, but is it just a demo? Who actually wants virtualization in consumer electronics? Well, another embedded virtualization vendor, OK Labs, claims they're installed in 250 million cell phones.

The funny thing is that I expect virtualization to be adopted more quickly in networking (Cisco, Juniper, et al) than consumer electronics. Of course, you probably won't see flashy demos from there, just more robustness, more features, and maybe even a better profit margin from the vendors (because they don't have to rewrite as much software).

Anyways, if there is still doubt about virtualization in embedded systems, I think demos like this should help evaporate it.

Update: Reacting to the same story, Ars Technica opines that embedded virtualization is "inevitable."

26 February 2009

multicore and unicore virtualization

One of the factors driving interest in virtualization for embedded systems is the increasing penetration of multicore processors in the embedded world. Although multicore systems have been around in the server world for decades, most traditional embedded systems have been strictly uniprocessor.

However, multicore processors are a big deal in some segments of the embedded world these days. I think most of the traditional RTOSs have been multicore-enabled by now, but what about all the other layers in the software stack?

One answer for dealing with single-threaded software in a multicore world is virtualization, or more specifically partitioning: carve up your multicore system into multiple single-core systems, and then you can run that legacy software without modification. In fact, you can still win on hardware costs, because you probably save money on space, power, cooling, etc.

But virtualization makes sense on unicore systems too! Consolidating multiple legacy systems onto a single new processor can save you space, cooling, and maybe even still improve performance if the legacy hardware is a lot slower than the new stuff.

So consolidation can make sense, but there's another really interesting benefit too. For some embedded system developers, software development resources are actually a major limiting factor. These folks want to roll out a product line spanning 1-, 2-, and 4-core systems, but they don't have enough software developers to re-engineer their stack for each system.

Enter virtualization: time-share the single-core system to make it appear to have four cores. Is it fast? Hell no, but that was never the point: this is the low-end model. The most important thing is that you've managed to stretch your software investment to cover the whole line.

This may sound like something you can do with threads in a normal time-sharing kernel, but that doesn't work if the software is actually kernel-level code. For example, in an asymmetric multiprocessing (AMP) networking system, you may have an RTOS (or less) handling the data plane (pushing packets), and Linux as the control plane. Instead of implementing a data plane surrogate, virtualization allows you to preserve the same software layers and go on to solve more interesting problems.

20 February 2009

embedded virtualization use case: hot upgrade

Mark Hermeling at Wind River has written about a valuable use case for virtualization in embedded systems: No Downtime Upgrade. I admit I often omit this from my presentations because it doesn't seem as sexy as other use cases, but I'm glad he wrote about it because to some systems it's absolutely critical.

Basically it goes like this: to upgrade the software running in some embedded device, you do the upgrade in a virtual machine clone of the active code. Then, when it's all patched and running, you switch over from the old code, which has been running the whole time. You can imagine how this applies to High Availability in general, outside the software upgrade case.

To me it doesn't sound that sexy, at least not compared to consolidating multiple hardware systems (e.g. networking control and data planes) onto a single system without impacting reliability. But consider this: some of these embedded devices handle lots of IO traffic all the time. Think about a RAID controller in a busy server, or a network backbone. If it takes you 10 seconds to upgrade your software, how many packets or IOps have you missed?

(Of course, unless you have built an extra core into your hardware design, this is something you can't really do if you're stuck with strict hardware isolation...)

17 February 2009

thin vs thick hypervisors

I made a presentation at the 2008 Linux Plumbers Conference about "thin" vs "thick" hypervisors, a subject very important to KVM in embedded systems. It's true Linux doesn't fit everywhere, and many embedded systems people have been dismayed by Linux's increasing memory footprint (which also hurts performance through cache and TLB pressure). Kevin Lawton has also written an article about this issue: it would increase KVM's appeal in embedded systems if we could first slim it down.

However, there's an important issue here that is sometimes obscured by memory footprint: functionality. Many of the proprietary "embedded virtualization" solutions (offered by vendors like OK Labs, Trango/VMware, VirtualLogix, et al) are thin precisely because they don't do a whole lot. In many cases, they are strictly about isolating hardware. That sounds good, right?

Strictly hardware isolation is a double-edged sword, because while it allows you to minimize the virtualization layer (good for memory/flash footprint, security, etc), it doesn't let developers take advantage of most of virtualization's benefits. I'm not even talking about "frills" like live migration (which could still be considered critical for high-availability in network infrastructure); I'm talking about basic consolidation, which is virtualization's bread and butter (and yes, software consolidation still makes sense in embedded systems).

You have 2 cores? If you only have 1 network interface, with a thin hypervisor you can only have 1 partition on the network. But even if you have 2 network interfaces for your 2 cores, there are still pesky bits of hardware you must consider. How many interrupt controllers do you have? How many real-time clocks? How many UARTs, and can you isolate them with the MMU, or are they all in the same page? If your software stacks require nonvolatile storage, how many flash controllers do you have?

For very simple services like PIC control, you can get away with embedding that directly in the hypervisor itself. Note however that the PICs on some modern systems require rather complicated configuration code, and re-implementing that can actually be pretty tricky. (I'm thinking about x86 ACPI and PowerPC device tree parsing here.)

Regardless, the only alternative for these services with a "thin" hypervisor is a dom0-style service partition. ("dom0" is the Xen terminology for the all-powerful Linux partition that is allowed to muck with all hardware on the system; all other partitions depend on it for IO services.) Once you go that route, you still have a thin hypervisor, but you've completely lost the security and reliability benefits that may have brought you to virtualization in the first place.

So for very simple use cases on very low-resource systems, thin hypervisors can make sense. But once you start to need anything beyond strict hardware partitioning, you've already entered the world of "thick" hypervisors, and the larger footprint of KVM becomes much less of an obstacle.

13 February 2009

Virtualization hardware support for embedded PowerPC

The Power Instruction Set Architecture (ISA) version 2.06 is now available from Power.org. I'm pretty excited about this because it finally includes hardware virtualization support for embedded PowerPC (in contrast to server PowerPC, which has had it for years).

So why does this matter? After all, VirtualLogix, Sysgo, and others have already implemented virtualization layers for embedded PowerPC. The problem is that these solutions require loss of isolation, loss of performance, dramatic modifications to the guest kernels, or some combination of all three.

In KVM for PowerPC (currently supporting PowerPC 440 and e500v2 cores), we chose to maximize isolation and require no changes to the guest kernel; we do this by running the guest kernel in user mode. As a consequence, we suffer reduced performance, since every privileged instruction in the guest kernel traps into the host, and we must emulate it. (Of course, the vast majority of instructions execute natively in hardware.)

Of course, the most important question is when we'll actually see hardware implementing this version of the architecture (and even then, virtualization is an optional feature). IBM hasn't made any announcements about that, but IBM hasn't delivered a new embedded core for a long time now. Freescale has already announced that their e500mc core (as found in their upcoming P4080 processor) will implement virtualization support, and they are even developing their own hypervisor for it.

(Updated Oct 2009 to fix KVM wiki URL.)