Showing posts with label kvm. Show all posts
Showing posts with label kvm. Show all posts

15 May 2009

Oracle buys another hypervisor?!

Oracle made headlines for two acqusitions in the past month, Sun and Virtual Iron. By my count, that makes them the proud owners of no fewer than four x86 hypervisors: Oracle VM, Sun's xVM Server, Sun's xVM Desktop (a.k.a. VirtualBox), and Virtual Iron. (I've never really understood why Sun had two.) All but VirtualBox are based on Xen.

Even with this surprising success in the game of "collect the Xen implementations," that still leaves at least Red Hat, Novell, and of course Citrix itself offering competing Xen solutions. I'll admit the Unix wars predate me, but that seems like an impressive degree of fragmentation. Still, Red Hat has announced plans to abandon Xen for KVM, and even Novell included KVM as a tech preview in SLES11. There's no question that the number of Xen-based products is about to significantly shrink.

I've even seen one person speculate that the cost of maintaining Xen is so high, that with Red Hat pulling out, Oracle must have been worried about strengthening their Xen development capabilities.

What's worse though is that each of those Oracle hypervisors has its own management stack. Systems management is one of those areas that sounds really easy, but in practice never is. Management software contains so many layers that it's hard to find anybody who actually understands the end-to-end details of a particular code path. You need translation layers for components that are too old, too new, or fit at a different layer, or written for a different management stack. In this case, you might find one management stack built on xm, another built on libvirt, another built on CIM (and "enterprise frameworks" are a whole other world of complexity). Do they use OVF for image files? Should they? Every design question has tradeoffs and requires serious consideration.

Speaking from experience at a large company, I expect there will be at least 6 months of churn while architects furiously scribe presentations, rearrange block diagrams, create hypothetical development sizings, establish migration plans for legacy customers, escalate issues to management (which will be sorting itself out too), find and get input from related organizations ("how does this affect our relationship with VMware?"), and in general figure out what they're doing. After all the dust has settled, they'll still need to write the code.

Will the eventual result of this consolidation be a stronger Xen ecosystem a few years from now? To be honest, I couldn't care less... but it could be worth the cost of a bag of popcorn.

30 April 2009

WAN optimization

I've wondered how "WAN optimization" magic works, and I just came across a page to explain it in (a little) more detail than marketing. I hadn't heard of it until a couple years ago, when some Cisco people mentioned that their WAASrouters embed KVM for virtualization.

Why would they do that? Because if your branch office uses an Active Directory server in your centralized data center, and your WAN link dies, work at the branch office ceases. From what I understand, Cisco's WAAS routers run an Active Directory server inside a virtual machine on the router itself, to mitigate that problem. A little googling reveals that similar approaches may be taken by their competition, 3Com and Riverbed.

In general, I expect we'll see much more virtualization in this area in the future. For example, today Cisco's
Application Extension Platform (AXP) products are physical x86 cards you stick into a router to run server workloads. It would be plain silly not to take advantage of the well-known consolidation benefits of virtualization to accomplish the same thing. (That's pure speculation, but as I said... silly.)

15 April 2009

leveraging Linux for virtualization: the dark side

I work on KVM, which is a relatively small kernel module that transforms the Linux kernel into a hypervisor. A hypervisor really is a kernel: it contains a scheduler (at least the good ones do ;), device drivers (at least interrupt controller, probably console, maybe more), memory management, interrupt handlers, bootstrap code, etc.

This is the key observation behind KVM's design. "Hmm, we need a kernel... and hey, we've already got one!" We just need to add some code to make it schedule kernels instead of userspace tasks. In fact, one of the major technical faults of the Xen project was that it needed to duplicate — often copy outright — Linux code, for features such as power management, NUMA support, an ACPI interpreter, PIC drivers, etc. By integrating with Linux, KVM gets all that for free.

There is a drawback to leveraging Linux though.

procon
use Linux's schedulerstuck with Linux's scheduler
use Linux's large page supportstuck with Linux's large page support
get lots of fancy Linux featuresstuck with the footprint of Linux's fancy features

Seeing a theme here? Let me share a little anecdote:

My team had been doing early development on KVM for PowerPC 440, and we were scheduled to do a demo at the Power.org Developer's Conference back in 2007. Unfortunately we weren't able to get Linux booting as a guest in time, but we had a simple standalone application we used instead. So when I say "early development" I mean "barely working."

A friend of mine walked up to the demo station and asked "Does nice work?" Now remember, basic functionality was missing. We couldn't even boot Linux. The only IO was a serial console. We had never touched a line of scheduler code, and certainly hadn't tested scheduling priorities. Despite all that, nice just worked because we were leveraging the Linux scheduler.

There's a down-side though. The Linux scheduler is famously tricky, and almost nobody wants to touch it because even slight tweaks can cause disastrous regressions for other workloads. The Linux scheduler does not support gang scheduling, where all threads of a particular task must be scheduled at once (or not at all).

Gang scheduling is very interesting for SMP guests using spinlocks. One virtual CPU could take a spinlock and then be de-scheduled by the host. Unaware of this important information, all the other virtual CPUs could spin waiting for the lock to be released, resulting in a lot of wasted CPU time. Gang scheduling is one way to avoid this problem by scheduling all virtual CPUs at once.

Since Linux doesn't support gang scheduling, and only a handful of people in the world have the technical skill and reputation to change that, that's basically a closed door.

This is just one example, but I think you can see that re-purposing Linux for virtualization is a tradeoff between functionality and control. If one were to write a new scheduler for a hypervisor, they'd need to implement nice themselves... but they would also be free to implement gang scheduling.

27 March 2009

Wind River Linux 3.0 adds KVM

Wind River recently released Wind River Linux 3.0, including KVM support (on x86 systems of course).

Wind River is better known for their VxWorks embedded RTOS, which traditionally has been one of the dominant operating systems in the embedded industry, and still is today. After criticizing Linux and the GPL (as VxWorks competition) for years, in 2003 the company gave in and started moving towards Linux support, including its own Linux distribution. Today Wind River Linux is appearing in more and more places in the embedded Linux market. I think it's considered #2 after MontaVista, though I admit I don't know the relative market shares there.

In some ways, KVM support in Wind River Linux isn't a big surprise, because we already know that Wind River believes in embedded virtualization so much they're writing their own hypervisor.

In other ways, it is a surprise, because KVM is a hypervisor too, and as such might compete with their own hypervisor. I suppose they will have lots of internal conversations about market positioning and how to convince the world they're not competing with themselves, but I guess every sufficiently large company has the exact same issue.

Anyways, the one big takeaway from all this is that Wind River seems to be saying that KVM is good enough for embedded systems. Since I've been saying the same thing for a while to a sometimes-skeptical audience, I'll take it. ;)

06 March 2009

VirtualLogix virtualization on Atom

Right now there are two embedded cores from Intel called "Atom": Silverthorne, which implements VT virtualization support, and Diamondville, which doesn't.

VirtualLogix just announced a port of VLX to Atom Z530, which is a Silverthorne core, though I have no firsthand knowledge if they use VT or not (too technical for a press release I guess). I would assume they do, since that's the only way to virtualize Windows, which they advertise elsewhere on their site.

Interestingly, Intel reported at KVM Forum 2008 that they had run Xen and KVM on Atom (Silverthorne) without problem. (I guess that's the value of a common instruction set...) The biggest issue they faced was at the system level: some Atom systems just don't have enough RAM to run more than one or two guests.

05 March 2009

real-time hypervisors

Chuck Yoo from Korea University presented at the Xen Summit last week about Real-time Xen for Embedded Devices (a video of the presentation is also available). He seems to be particularly interested in mobile phones, so his motivation was running the radio stack separate from the user interface.

One of his observations is that interrupts interfere with predictability. To mitigate this, one can disable interrupts completely (poll), or defer interrupt handling so it occurs mostly in schedulable tasks (that way the RT scheduler can prioritize RT tasks over interrupt handling).

I guess that makes sense if your primary objective is scheduling. I'm not really a real-time person, but the real-time metric I hear about the most is "interrupt latency" -- the time between interrupt delivery and running the interrupt handler (lower is better). In this regard, the fact that Xen has no "hypervisor threads" is terrible, because it means that a Xen code path cannot be paused due to interrupt.

In contrast, because KVM is based on the Linux kernel, which can preempt kernel threads (to defer interrupt handlers), this is not an issue.

On the subject of scheduling determinism, Chuck makes the key point that the host scheduler must also have visibility into the real-time characteristics of guest tasks, and he suggests "aggregation" as a way for the host scheduler to account for real-time tasks in multiple guests.

Chuck later observes that Xen's IO model (asynchronous non-deterministic communication with dom0) is also a huge obstacle. As a workaround, he proposes giving RT domains direct access to dedicated hardware.

In contrast, KVM schedules guests and qemu (which provides the virtual IO services) as a single unit. When the guest needs IO services, that means qemu is "already running", and the scheduler doesn't get involved. (I will admit things are becoming a little more fuzzy with qemu's adoption of IO threads, but qemu is shared between Xen and KVM anyways.)

At any rate, a very interesting presentation, though I believe that by starting with Xen, Chuck and his students are needlessly handicapping themselves. Unfortunately I think many embedded systems people are lured to Xen by the promise of a "tiny hypervisor kernel," without realizing the depth of dependencies on the Linux dom0...

13 February 2009

Virtualization hardware support for embedded PowerPC

The Power Instruction Set Architecture (ISA) version 2.06 is now available from Power.org. I'm pretty excited about this because it finally includes hardware virtualization support for embedded PowerPC (in contrast to server PowerPC, which has had it for years).

So why does this matter? After all, VirtualLogix, Sysgo, and others have already implemented virtualization layers for embedded PowerPC. The problem is that these solutions require loss of isolation, loss of performance, dramatic modifications to the guest kernels, or some combination of all three.

In KVM for PowerPC (currently supporting PowerPC 440 and e500v2 cores), we chose to maximize isolation and require no changes to the guest kernel; we do this by running the guest kernel in user mode. As a consequence, we suffer reduced performance, since every privileged instruction in the guest kernel traps into the host, and we must emulate it. (Of course, the vast majority of instructions execute natively in hardware.)

Of course, the most important question is when we'll actually see hardware implementing this version of the architecture (and even then, virtualization is an optional feature). IBM hasn't made any announcements about that, but IBM hasn't delivered a new embedded core for a long time now. Freescale has already announced that their e500mc core (as found in their upcoming P4080 processor) will implement virtualization support, and they are even developing their own hypervisor for it.

(Updated Oct 2009 to fix KVM wiki URL.)