lots of little pieces: real-time hypervisors

05 March 2009

real-time hypervisors

Chuck Yoo from Korea University presented at the Xen Summit last week about Real-time Xen for Embedded Devices (a video of the presentation is also available). He seems to be particularly interested in mobile phones, so his motivation was running the radio stack separate from the user interface.

One of his observations is that interrupts interfere with predictability. To mitigate this, one can disable interrupts completely (poll), or defer interrupt handling so it occurs mostly in schedulable tasks (that way the RT scheduler can prioritize RT tasks over interrupt handling).

I guess that makes sense if your primary objective is scheduling. I'm not really a real-time person, but the real-time metric I hear about the most is "interrupt latency" -- the time between interrupt delivery and running the interrupt handler (lower is better). In this regard, the fact that Xen has no "hypervisor threads" is terrible, because it means that a Xen code path cannot be paused due to interrupt.

In contrast, because KVM is based on the Linux kernel, which can preempt kernel threads (to defer interrupt handlers), this is not an issue.

On the subject of scheduling determinism, Chuck makes the key point that the host scheduler must also have visibility into the real-time characteristics of guest tasks, and he suggests "aggregation" as a way for the host scheduler to account for real-time tasks in multiple guests.

Chuck later observes that Xen's IO model (asynchronous non-deterministic communication with dom0) is also a huge obstacle. As a workaround, he proposes giving RT domains direct access to dedicated hardware.

In contrast, KVM schedules guests and qemu (which provides the virtual IO services) as a single unit. When the guest needs IO services, that means qemu is "already running", and the scheduler doesn't get involved. (I will admit things are becoming a little more fuzzy with qemu's adoption of IO threads, but qemu is shared between Xen and KVM anyways.)

At any rate, a very interesting presentation, though I believe that by starting with Xen, Chuck and his students are needlessly handicapping themselves. Unfortunately I think many embedded systems people are lured to Xen by the promise of a "tiny hypervisor kernel," without realizing the depth of dependencies on the Linux dom0...

6 comments:

edison12 March, 2009 20:05
"In this regard, the fact that Xen has no "hypervisor threads" is terrible, because it means that a Xen code path cannot be paused due to interrupt."----It's true, but one can argue that a Xen code path is not that long as Linux...due to its "tiny hypervisor"

"In contrast, KVM schedules guests and qemu (which provides the virtual IO services) as a single unit. When the guest needs IO services, that means qemu is "already running", and the scheduler doesn't get involved." ---- I am confused. Per my understanding, Linux kernel scheduler can get involved in your case. For example, guest accessing virtual IO devices emulated by qemu, then trapped into Linux kernel(KVM), kernel then returns to userspace becasue qemu is in userspace. During this timeframe, Linux kernel can pick up another user application other than current running qemu process when retuning to userspace. At least, it's true for x86, not sure about PPC.
So the latency of I/O model is still non-deterministic, but maybe you can set qemu process as the highest priority:)
ReplyDelete
Replies
Hollis Blanchard13 March, 2009 07:04
Yes exactly (the high priority comment).

What I was trying to emphasize is that the context switch from guest to qemu doesn't involve the host scheduler: it's implemented in KVM code.

The scheduler invocation you're thinking about is when the timer tick fires while qemu happens to be running. This is a larger problem, and must be solved the same as you'd solve it for any Linux real-time process.

This is a theme you'll hear over and over with KVM: you solve the problem just like you'd solve it for any Linux process. On the flip side, Linux itself must be extended for cases not already handled, and that improvement often benefits all Linux users. Good example: dynamic memory sharing.
ReplyDelete
Replies
edison16 March, 2009 07:06
Thanks for your reply:) I agree with that KVM's huge advantage is it can leverage whatever facilities that Linux provides. But in term of real time, Linux may be not good at it(Just have that kind of feeling, I know there is out-of-tree RT-preempt patches though. Correct me if I am wrong). If it's true, How can KVM get better RT capability, just based on a not-well-performed-in-RT Linux? Yes, if Linux is a good RT OS candidate, then KVM can leverage it, no problem with that.
For xen, xen's advantage is its small code base, when comparing with Linux-KVM(I am not talking about dom0...:) small code base means it's easy to meansure/analyse max latency in worst case. The problem is Xen's I/O model, which heavily depends on dom0. It's evil...I am thinking about using driver domain instead. To make sure driver domain is independent from dom0(may need to write separate driver for this domain, not an easy task), then RT guest and its corresponding driver domain can be scheduled together with high priority.
Glad to see your input:)
ReplyDelete
Replies
Hollis Blanchard16 March, 2009 09:20
As I understand it, depending on the hardware platform, the Linux-RT patches are currently providing worst-case interrupt latency of 40-50 µs. This is worse than some proprietary RTOSs which claim <10 µs latency, but better than a non-RT kernel.

Not all of the Linux RT patches have been merged into the main Linux kernel yet, but as a Xen user, that shouldn't upset you at all. ;)

There are standard benchmarks used to measure and analyze RT performance on Linux, such as cyclictest. Some of them have even been integrated into the LTP project. There's even a top-level script in the LTP directory to run the RT tests for you; I'm told it's quite simple.
ReplyDelete
Replies
edison16 March, 2009 20:02
Thanks for your information about Linux-RT. One more question, let say Linux-RT provides worst-case interrupt latency of 40-50 us, and as you said some proprietary RTOSs claims <10us. So I can't create a RT guest(proprietary RTOSs in it) which gets <10us latency as it does in native. Linux-RT provides a threshold for worst-case latency at 40-50 us, am I right?
So my point is if you use Linux as RT hypervisor, you inherit all the pro/cos of Linux.
But for xen, somehow you can bypass dom0 to some extend for RT guest, such as passing through device for RT guest. Then deterministic and predicatability can be achieved, maybe proprietary RTOS guest can have the same RT capability as it does in native.
In a word, Linux as a RT hypervisor, which has its "RT overhead" at 40-50 us as you said.
Xen as a RT hypervisor, hmm, maybe less, maybe larger...:)
ReplyDelete
Replies
Hollis Blanchard17 March, 2009 07:02
You've hit it exactly: KVM inherits both the pros and cons of Linux.

As for Xen, don't forget the comment that started this whole thread: Xen has no "hypervisor threads" (think about kernel threads).

Let's say a Xen domain causes a page fault, and the Xen hypervisor begins to service that. An instant later, a device raises an interrupt. Regardless of whether Xen's interrupt vector is actually triggered at that time, there's nothing it can do: there is no way to suspend the page fault code path and schedule a RT guest!

Xen's workaround for this is called "continuations" -- these are manual checks inserted into the code paths that are expected to take a long time (like zeroing pages). If some event occurs in the meantime, the continuation check will eventually see it and basically return EBUSY to the caller. This does reduce latency, but only in certain hand-selected code paths.

Ultimately, it is Xen that is completely unmeasured, and we already know it suffers at least one serious design flaw. Maybe the reason you think Xen is a good research topic is that you know there's so much work you'll need to do... but if you have real-world requirements, I think it makes more sense to improve something that's already working.
ReplyDelete
Replies

Add comment

lots of little pieces

05 March 2009

real-time hypervisors

6 comments:

archive

about me

disclaimer