lots of little pieces: xen

Showing posts with label xen. Show all posts

19 June 2009

Solaris: the awesomest virtualization ever?

There's an advocacy piece, in Forbes of all places, that offers a pretty unbalanced perspective on Solaris. (I hesitate to even link to it, because it feels like sensationalism just to generate huge numbers of outraged readers.) The author, Dan Woods, doesn't mention any negative points to Solaris at all, which should raise the suspicions of any reader with critical thinking skills, but I wanted to debunk some of the virtualization statements in particular:

Dan claims that Linux virtualization is the result of un-coordinated development from a number of companies, and Solaris virtualization is better because it's engineered "top to bottom" at a single company. Seamless integration can certainly offer advantages (see Apple), but I take issue with both his observations about the ecosystem and the conclusions he draws from them.

Aside from containers, Solaris uses hypervisors from Xen (marketed as xVM), and VirtualBox (from the innotek acquisition). Neither of those solutions were designed for Solaris; they were adopted years later to fill gaps in Sun's offerings. However, they are currently developed by Sun, so you still have the "single company" argument. About that...

Where I come from, being completely dependent on a single company is a bad thing, and I'm not even talking about the freedom of open source. It's called "vendor lock-in," and it's bad because there's no competition and customers are at the mercy of that single company's roadmap. Companies invest lots of money developing and supporting 3rd party ecosystems because it's a critically important to their customers. Anyways, looking at it from another angle, isn't it disturbing that virtualization ISVs don't consider Solaris important enough to target? Sun had to buy them outright or develop solutions in-house.

Dan claims Solaris containers cause a 2% performance degradation, vs. "about 20% for a hypervisor." While it's true that Forbes isn't a good forum for presenting performance analyses, without even a hint about where they came from, offering these numbers is ridiculous. It's often true that you can pick a specific benchmark and environment to support any argument, but Dan didn't even pretend.

Finally, I thought Dan's most interesting claim was the one for which he didn't offer any supporting arguments at all: that Solaris is now safe. Even if he's right, and Solaris is indeed the most awesome OS ever seen, that still doesn't guarantee it a slot on the Oracle roadmap.

15 May 2009

Oracle buys another hypervisor?!

Oracle made headlines for two acqusitions in the past month, Sun and Virtual Iron. By my count, that makes them the proud owners of no fewer than four x86 hypervisors: Oracle VM, Sun's xVM Server, Sun's xVM Desktop (a.k.a. VirtualBox), and Virtual Iron. (I've never really understood why Sun had two.) All but VirtualBox are based on Xen.

Even with this surprising success in the game of "collect the Xen implementations," that still leaves at least Red Hat, Novell, and of course Citrix itself offering competing Xen solutions. I'll admit the Unix wars predate me, but that seems like an impressive degree of fragmentation. Still, Red Hat has announced plans to abandon Xen for KVM, and even Novell included KVM as a tech preview in SLES11. There's no question that the number of Xen-based products is about to significantly shrink.

I've even seen one person speculate that the cost of maintaining Xen is so high, that with Red Hat pulling out, Oracle must have been worried about strengthening their Xen development capabilities.

What's worse though is that each of those Oracle hypervisors has its own management stack. Systems management is one of those areas that sounds really easy, but in practice never is. Management software contains so many layers that it's hard to find anybody who actually understands the end-to-end details of a particular code path. You need translation layers for components that are too old, too new, or fit at a different layer, or written for a different management stack. In this case, you might find one management stack built on xm, another built on libvirt, another built on CIM (and "enterprise frameworks" are a whole other world of complexity). Do they use OVF for image files? Should they? Every design question has tradeoffs and requires serious consideration.

Speaking from experience at a large company, I expect there will be at least 6 months of churn while architects furiously scribe presentations, rearrange block diagrams, create hypothetical development sizings, establish migration plans for legacy customers, escalate issues to management (which will be sorting itself out too), find and get input from related organizations ("how does this affect our relationship with VMware?"), and in general figure out what they're doing. After all the dust has settled, they'll still need to write the code.

Will the eventual result of this consolidation be a stronger Xen ecosystem a few years from now? To be honest, I couldn't care less... but it could be worth the cost of a bag of popcorn.

15 April 2009

leveraging Linux for virtualization: the dark side

I work on KVM, which is a relatively small kernel module that transforms the Linux kernel into a hypervisor. A hypervisor really is a kernel: it contains a scheduler (at least the good ones do ;), device drivers (at least interrupt controller, probably console, maybe more), memory management, interrupt handlers, bootstrap code, etc.

This is the key observation behind KVM's design. "Hmm, we need a kernel... and hey, we've already got one!" We just need to add some code to make it schedule kernels instead of userspace tasks. In fact, one of the major technical faults of the Xen project was that it needed to duplicate — often copy outright — Linux code, for features such as power management, NUMA support, an ACPI interpreter, PIC drivers, etc. By integrating with Linux, KVM gets all that for free.

There is a drawback to leveraging Linux though.

pro	con
use Linux's scheduler	stuck with Linux's scheduler
use Linux's large page support	stuck with Linux's large page support
get lots of fancy Linux features	stuck with the footprint of Linux's fancy features

Seeing a theme here? Let me share a little anecdote:

My team had been doing early development on KVM for PowerPC 440, and we were scheduled to do a demo at the Power.org Developer's Conference back in 2007. Unfortunately we weren't able to get Linux booting as a guest in time, but we had a simple standalone application we used instead. So when I say "early development" I mean "barely working."

A friend of mine walked up to the demo station and asked "Does nice work?" Now remember, basic functionality was missing. We couldn't even boot Linux. The only IO was a serial console. We had never touched a line of scheduler code, and certainly hadn't tested scheduling priorities. Despite all that, nice just worked because we were leveraging the Linux scheduler.

There's a down-side though. The Linux scheduler is famously tricky, and almost nobody wants to touch it because even slight tweaks can cause disastrous regressions for other workloads. The Linux scheduler does not support gang scheduling, where all threads of a particular task must be scheduled at once (or not at all).

Gang scheduling is very interesting for SMP guests using spinlocks. One virtual CPU could take a spinlock and then be de-scheduled by the host. Unaware of this important information, all the other virtual CPUs could spin waiting for the lock to be released, resulting in a lot of wasted CPU time. Gang scheduling is one way to avoid this problem by scheduling all virtual CPUs at once.

Since Linux doesn't support gang scheduling, and only a handful of people in the world have the technical skill and reputation to change that, that's basically a closed door.

This is just one example, but I think you can see that re-purposing Linux for virtualization is a tradeoff between functionality and control. If one were to write a new scheduler for a hypervisor, they'd need to implement nice themselves... but they would also be free to implement gang scheduling.

05 March 2009

real-time hypervisors

Chuck Yoo from Korea University presented at the Xen Summit last week about Real-time Xen for Embedded Devices (a video of the presentation is also available). He seems to be particularly interested in mobile phones, so his motivation was running the radio stack separate from the user interface.

One of his observations is that interrupts interfere with predictability. To mitigate this, one can disable interrupts completely (poll), or defer interrupt handling so it occurs mostly in schedulable tasks (that way the RT scheduler can prioritize RT tasks over interrupt handling).

I guess that makes sense if your primary objective is scheduling. I'm not really a real-time person, but the real-time metric I hear about the most is "interrupt latency" -- the time between interrupt delivery and running the interrupt handler (lower is better). In this regard, the fact that Xen has no "hypervisor threads" is terrible, because it means that a Xen code path cannot be paused due to interrupt.

In contrast, because KVM is based on the Linux kernel, which can preempt kernel threads (to defer interrupt handlers), this is not an issue.

On the subject of scheduling determinism, Chuck makes the key point that the host scheduler must also have visibility into the real-time characteristics of guest tasks, and he suggests "aggregation" as a way for the host scheduler to account for real-time tasks in multiple guests.

Chuck later observes that Xen's IO model (asynchronous non-deterministic communication with dom0) is also a huge obstacle. As a workaround, he proposes giving RT domains direct access to dedicated hardware.

In contrast, KVM schedules guests and qemu (which provides the virtual IO services) as a single unit. When the guest needs IO services, that means qemu is "already running", and the scheduler doesn't get involved. (I will admit things are becoming a little more fuzzy with qemu's adoption of IO threads, but qemu is shared between Xen and KVM anyways.)

At any rate, a very interesting presentation, though I believe that by starting with Xen, Chuck and his students are needlessly handicapping themselves. Unfortunately I think many embedded systems people are lured to Xen by the promise of a "tiny hypervisor kernel," without realizing the depth of dependencies on the Linux dom0...

17 February 2009

thin vs thick hypervisors

I made a presentation at the 2008 Linux Plumbers Conference about "thin" vs "thick" hypervisors, a subject very important to KVM in embedded systems. It's true Linux doesn't fit everywhere, and many embedded systems people have been dismayed by Linux's increasing memory footprint (which also hurts performance through cache and TLB pressure). Kevin Lawton has also written an article about this issue: it would increase KVM's appeal in embedded systems if we could first slim it down.

However, there's an important issue here that is sometimes obscured by memory footprint: functionality. Many of the proprietary "embedded virtualization" solutions (offered by vendors like OK Labs, Trango/VMware, VirtualLogix, et al) are thin precisely because they don't do a whole lot. In many cases, they are strictly about isolating hardware. That sounds good, right?

Strictly hardware isolation is a double-edged sword, because while it allows you to minimize the virtualization layer (good for memory/flash footprint, security, etc), it doesn't let developers take advantage of most of virtualization's benefits. I'm not even talking about "frills" like live migration (which could still be considered critical for high-availability in network infrastructure); I'm talking about basic consolidation, which is virtualization's bread and butter (and yes, software consolidation still makes sense in embedded systems).

You have 2 cores? If you only have 1 network interface, with a thin hypervisor you can only have 1 partition on the network. But even if you have 2 network interfaces for your 2 cores, there are still pesky bits of hardware you must consider. How many interrupt controllers do you have? How many real-time clocks? How many UARTs, and can you isolate them with the MMU, or are they all in the same page? If your software stacks require nonvolatile storage, how many flash controllers do you have?

For very simple services like PIC control, you can get away with embedding that directly in the hypervisor itself. Note however that the PICs on some modern systems require rather complicated configuration code, and re-implementing that can actually be pretty tricky. (I'm thinking about x86 ACPI and PowerPC device tree parsing here.)

Regardless, the only alternative for these services with a "thin" hypervisor is a dom0-style service partition. ("dom0" is the Xen terminology for the all-powerful Linux partition that is allowed to muck with all hardware on the system; all other partitions depend on it for IO services.) Once you go that route, you still have a thin hypervisor, but you've completely lost the security and reliability benefits that may have brought you to virtualization in the first place.

So for very simple use cases on very low-resource systems, thin hypervisors can make sense. But once you start to need anything beyond strict hardware partitioning, you've already entered the world of "thick" hypervisors, and the larger footprint of KVM becomes much less of an obstacle.

lots of little pieces