30 April 2009

WAN optimization

I've wondered how "WAN optimization" magic works, and I just came across a page to explain it in (a little) more detail than marketing. I hadn't heard of it until a couple years ago, when some Cisco people mentioned that their WAASrouters embed KVM for virtualization.

Why would they do that? Because if your branch office uses an Active Directory server in your centralized data center, and your WAN link dies, work at the branch office ceases. From what I understand, Cisco's WAAS routers run an Active Directory server inside a virtual machine on the router itself, to mitigate that problem. A little googling reveals that similar approaches may be taken by their competition, 3Com and Riverbed.

In general, I expect we'll see much more virtualization in this area in the future. For example, today Cisco's
Application Extension Platform (AXP) products are physical x86 cards you stick into a router to run server workloads. It would be plain silly not to take advantage of the well-known consolidation benefits of virtualization to accomplish the same thing. (That's pure speculation, but as I said... silly.)

20 April 2009

package management and embedded Linux distributions

A while back, when it looked like the Kuro Box would actually go somewhere, I bought the original model. Among other things, this consists of a 200MHz Freescale e300 core (similar to PowerPC 603e) and 64MB of RAM. No serial port even, but through u-boot it has netconsole and netboot support. I had a decrepit version of Debian installed, and Debian's package management tools frustrate me to the point that I was helpless to get the thing upgraded (something about packages mysteriously "being held back").

For a system like this, the real value of a Linux distribution is the frequency of its security updates. There are some embedded distributions I could have messed with, but I have no idea how reliable their updates are, and I really don't want to be responsible for that myself. I started looking for a "normal" Linux distribution to install. (Unfortunately, the number of mainstream Linux distributions with PowerPC support are shrinking...)

I'm really a Fedora guy. Back in university I did some packaging for LinuxPPC and Yellow Dog Linux (both of which were Red Hat variants), and I'm comfortable enough with RPM to bootstrap a system from just about nothing. So I tried the Fedora installer, and when anaconda failed miserably with netconsole, I painfully installed it package by package.

Long story short, yum (the Fedora software installer) requires an absurd amount of RAM... way more than the 64MB my little Kuro has. The simplest yum operation was taking hours, and I could see I was well into swap. So after all that work, the box was still useless. I ran out of play time, unplugged the thing (since I couldn't install any security updates for it), and there it has sat for another 6 months.

Today I ran across a blog post comparing the speed and memory consumption of yum and zypper (OpenSUSE's package software installer). I don't know much about OpenSUSE, but I will be trying it next...

15 April 2009

leveraging Linux for virtualization: the dark side

I work on KVM, which is a relatively small kernel module that transforms the Linux kernel into a hypervisor. A hypervisor really is a kernel: it contains a scheduler (at least the good ones do ;), device drivers (at least interrupt controller, probably console, maybe more), memory management, interrupt handlers, bootstrap code, etc.

This is the key observation behind KVM's design. "Hmm, we need a kernel... and hey, we've already got one!" We just need to add some code to make it schedule kernels instead of userspace tasks. In fact, one of the major technical faults of the Xen project was that it needed to duplicate — often copy outright — Linux code, for features such as power management, NUMA support, an ACPI interpreter, PIC drivers, etc. By integrating with Linux, KVM gets all that for free.

There is a drawback to leveraging Linux though.

procon
use Linux's schedulerstuck with Linux's scheduler
use Linux's large page supportstuck with Linux's large page support
get lots of fancy Linux featuresstuck with the footprint of Linux's fancy features

Seeing a theme here? Let me share a little anecdote:

My team had been doing early development on KVM for PowerPC 440, and we were scheduled to do a demo at the Power.org Developer's Conference back in 2007. Unfortunately we weren't able to get Linux booting as a guest in time, but we had a simple standalone application we used instead. So when I say "early development" I mean "barely working."

A friend of mine walked up to the demo station and asked "Does nice work?" Now remember, basic functionality was missing. We couldn't even boot Linux. The only IO was a serial console. We had never touched a line of scheduler code, and certainly hadn't tested scheduling priorities. Despite all that, nice just worked because we were leveraging the Linux scheduler.

There's a down-side though. The Linux scheduler is famously tricky, and almost nobody wants to touch it because even slight tweaks can cause disastrous regressions for other workloads. The Linux scheduler does not support gang scheduling, where all threads of a particular task must be scheduled at once (or not at all).

Gang scheduling is very interesting for SMP guests using spinlocks. One virtual CPU could take a spinlock and then be de-scheduled by the host. Unaware of this important information, all the other virtual CPUs could spin waiting for the lock to be released, resulting in a lot of wasted CPU time. Gang scheduling is one way to avoid this problem by scheduling all virtual CPUs at once.

Since Linux doesn't support gang scheduling, and only a handful of people in the world have the technical skill and reputation to change that, that's basically a closed door.

This is just one example, but I think you can see that re-purposing Linux for virtualization is a tradeoff between functionality and control. If one were to write a new scheduler for a hypervisor, they'd need to implement nice themselves... but they would also be free to implement gang scheduling.