SIGILLatum

Logo

Petr Tesarik's Linux blog

View the Project on GitHub ptesarik/SIGILLatum.github.io

4 December 2023

Kernel Sandbox Mode

by Petr Tesařík

Why?

This must always be the first question. Why would anyone want to impose artificial limitations on their code? The answer is self-protection.

This is nothing new. However, if you go and read the Kernel Self-Protection article, you’ll learn that successful self-protection systems are:

By these measures, my Sandbox Mode idea should be a spectacular failure.

Let’s start with the good news. Sandbox Mode is very effective, preventing even actively malicious kernel code from modifying anything outside a pre-defined output memory area. If you accept that information leaks are out of scope, the isolation is just as good as the isolation between kernel mode and user mode.

There are currently no tests. Writing a test suite requires a lot of effort which would be wasted if the whole sandbox idea itself is rejected straight away by the kernel community. Of course, tests will be provided in the initial submission if there is promising feedback.

Next, Sandbox Mode is on by default. But only if the corresponding config option is enabled. And that one is much more likely to stay off by default.

From here on, things start to look dire.

Sandbox Mode is definitely opt-in by developers. The core idea is in fact to make multiple smaller programs from a single bigger one. I wonder if anyone is able to automate such a task.

Sandbox Mode does have performance impact, although mostly restricted to entering and exiting sandbox mode. On the positive side, there is at least zero performance penalty while sandbox code is not running.

Last but not least, the initial implementation makes it harder to debug kernel code running in a sandbox. In particular, sandbox code cannot be instrumented, and even in-kernel stack unwinding is broken. OTOH none of these limitations is inherent to the concept of Sandbox Mode. Debugging of sandbox code can be enabled with reasonable effort.

Anyway, with so many drawbacks, why do I even try?

Although the goal of Sandbox Mode is self-protection, it does not try to compete with generic mechanisms such as the stack protector or read-only sensitive variables.

Closest to Sandbox Mode are user mode helpers. If your code is too complex to run in the kernel, and you consider to write a user space helper, Sandbox Mode may be a better choice:

How?

Sandbox mode runs in non-privileged CPU mode, that is the mode which is normally used to run user space, not kernel code. It runs within its own address space, which is a subset of the kernel address space, possibly with modified permission bits. This allows, for example, to map read-write kernel data into the sandbox as read-only, so kernel code running in the sandbox can read but not modify such data.

On CPU interrupt, a trampoline handler enters kernel mode to invoke the original kernel handler. Note that this is also the only way to preempt code running in Sandbox Mode.

The page fault CPU exception is intercepted and terminates the sandbox.

When?

I have written an initial implementation. It seems to work fine. However, the API to call a function in Sandbox Mode needs some more thought. I have identified a few potential users and I am now trying to design an easy-to-use API which could deal with this existing code.

tags: sandbox - kernel