October 2025
What if a kernel was just an event loop? No scheduler, no memory protection, no blocking calls - just an async runtime that happens to run on bare metal.
There's kernels, there's microkernels, there's nanokernels, there's unikernels, and then there's what I'm building, which is so tiny, so under-featured, so unsafe that I'm not even sure it qualifies as a kernel. But it does allow for logging, timers, random number generation, block storage, UDP/IP networking, and running freestanding C code targeting its minimal interface. So, I'm going to call it a "picokernel" and I'd like to tell you about it.
Why build another minimal kernel? There are learning kernels out there - xv6, OS/161, and others - but they're typically tied to a single architecture or focus on traditional Unix-style abstractions. There are unikernels like MirageOS, but it's written in OCaml. Embedded RTOS systems are portable but often proprietary or baroque, and often lean into threads and the same Unix-style abstractions.
But, forget the reasons, I just thought it'd be fun and cute, and I was right!
I wanted to see what a minimal, portable kernel looks like. Minimal as in does the bare minimum for the functionality I want. Portable as in works across x86, ARM, RISC-V, 32-bit and 64-bit (and has a hope of running on 32-bit microcontrollers as well as hosted on other operating systems).
The basic structure is an async runtime. Everything is cooperative, event-driven, and non-blocking. There's no preemption, no scheduling, no memory protection. It's closer to embedded firmware than to traditional kernels. This simplicity makes the entire system easy to reason about: requests go in, completions come out, state machines tick forward.
Think of it as a portable async runtime that happens to run on bare metal, rather than a traditional kernel that has async I/O.
Not much, but not nothing:
What does it not do?
At the moment, it's a work-in-progress. The structure is there but I'm still working on the finer points of racy interrupt handling and synchronization barriers around memory-mapped IO.
What kind of "machines" does it run on? Currently it runs on ARM, x86, RISC-V, both 32-bit and 64-bit, under QEMU, with virtio (PCI and MMIO) devices.
Here's the QEMU command for arm64, attaching virtio MMIO devices for an RNG, a block device (i.e. a hard drive), and a network device. The commands for the other architectures are similar.
qemu-system-aarch64 \
-machine virt -cpu cortex-a57 \
-m 128M -smp 1 \
-nographic -nodefaults -no-user-config -no-reboot \
-kernel build/arm64/kernel.elf \
-serial stdio \
-device virtio-rng-device \
-device virtio-blk-device,drive=hd0 \
-drive file=/tmp/drive.img,if=none,id=hd0,format=raw,cache=none \
-device virtio-net-device,netdev=net0
-netdev user,id=net0,hostfwd=udp::8888-10.0.2.15:8080
Portability hinges on a clean abstraction boundary. Each platform implementation (arm64, x86, etc.) must provide a minimal set of capabilities to the platform-agnostic kernel:
kmain()platform_uart_write()platform_wfi())That's it. No MMU setup, no memory allocator, no thread scheduler. The kernel lives in a single address space with no protection boundaries.
If you want to try it yourself:
git clone https://github.com/rsepassi/vmos
cd vmos
make run PLATFORM=arm64 PORT=8888
You should see the kernel boot, initialize virtio devices, and run whatever's
in kmain_usermain() (kernel/user.c). The default user code requests random
numbers and prints them, runs a read/write test on a file-backed block device,
uses ARP to communicate to the QEMU gateway, and starts a UDP echo server.
Try sending a UDP packet to the echo server (in a separate terminal):
echo "hello" | nc -u localhost 8888
You should see your message echoed back in both the nc output and the kernel logs.
PLATFORM can be one of rv32, rv64, arm32, arm64, x32, x64.
To use PCI devices instead of MMIO:
make run PLATFORM=arm64 USE_PCI=1
All you need is make, clang, and qemu-system-*. clang handles the
cross-compilation.
Here's what it looks like to request some random numbers (timers, storage read/write/flush, and UDP send/recv are similar):
typedef struct {
krng_req_t rng_req;
uint8_t random_buf[32];
} kuser_t;
static kuser_t g_user;
void kmain_usermain(kernel_t* k) {
kuser_t* ctx = &g_user;
// Submission
// 1. Configure the work item with a callback
kwork_init(&ctx->rng_req.work, KWORK_OP_RNG_READ, ctx, on_random_ready, 0);
// 2. Configure the request
ctx->rng_req.buffer = ctx->random_buf;
ctx->rng_req.length = 32;
// 3. Submit to queue (non-blocking!)
KASSERT(ksubmit(k, &ctx->rng_req.work) == KERR_OK);
}
// 4. Callback fires when ready
static void on_random_ready(kwork_t *work) {
kuser_t *ctx = work->ctx;
krng_req_t *req = KCONTAINER_OF(work, krng_req_t, work);
KASSERT(work->result == KERR_OK);
KASSERT(req->completed == 32);
printk("Random bytes: ");
for (size_t i = 0; i < 32; i++) {
printk_hex8(ctx->random_buf[i]);
}
printk("\n");
}
So the entire system is based on async completions, entirely cooperatively scheduled, and entirely in a single address space. So, no synchronous calls, no context switches, no preemptions.
Where you see completions, there must be an event loop. There sure is. Here's "kmain", the kernel entry point:
static kernel_t g_kernel;
void kmain(void *platform_boot_ctx) {
// Banner
printk("\n\n=== KMAIN ===\n\n");
// Initialize
kernel_t *k = &g_kernel;
kmain_init(k, platform_boot_ctx);
// User kickoff
kmain_usermain(k);
// Event loop
while (1) {
// tick: Process completions, expire timers, run callbacks
kmain_tick(k, k->current_time_ms);
// next_delay: When's the next timer?
uint64_t timeout = KMIN(kmain_next_delay(k), 2000);
// wfi: Wait for interrupt (or timeout)
k->current_time_ms = platform_wfi(&k->platform, timeout);
}
}
How about some line counts?
Makefile 208
kernel/ 1.5KLOC
driver/ 1.5KLOC
platform/
arm32/ 1.5KLOC
arm64/ 1.5KLOC
rv32/ 1.5KLOC
rv64/ 1.5KLOC
x32/ 3KLOC
x64/ 3KLOC
It's all freestanding C11 code, no external source dependencies, and the tooling is make, sh, clang, qemu.
(In the repo, you'll see vendor/monocypher, which provides ChaCha20 for the
kernel's CSPRNG, seeded by hardware randomness, but it's not core to the
kernel).
The (stripped) kernel.elf binaries weigh in at:
arm32 128K
arm64 128K
rv32 64K
rv64 65K
x32 56K
x64 60K
Debugging is printk-driven, but crashes are straightforward: llvm-objdump on
the ELF shows exactly where the PC landed. The workflow is tight: edit, make run, see output, repeat. Most bugs are caught by assertions or manifest as
immediate crashes rather than subtle corruption, thanks to the single address
space and (current) lack of concurrency.
There isn't one.
No memory protection, no privilege levels, no isolation boundaries. Everything runs in a single address space with full hardware access. A bug in user code can corrupt kernel state. A stray pointer can overwrite interrupt handlers. There's no defense against malicious or buggy code.
VMOS is suitable for:
VMOS is not suitable for:
The lack of kernel-enforced safety doesn't mean applications must be unsafe. Several approaches could provide memory safety and concurrency safety at the application level:
Language-level safety: Compile from memory-safe languages (Rust, Pony) or checked C subsets to VMOS's C API. The application becomes safe even though the kernel isn't.
WebAssembly: Run all application code in a Wasm runtime. The kernel becomes just a capabilities provider to sandboxed code.
Verified code: Formally verify critical components, proving memory safety without runtime overhead.
These approaches let you choose your safety/complexity tradeoff rather than forcing one on everyone.
There's something fundamental about an async runtime that makes it feel like the "right" abstraction for this level of system programming. It's not just about performance or simplicity - it's about alignment with how hardware actually works.
Hardware doesn't manage itself via "threads". It has state and events. A disk controller doesn't "block" waiting for a sector to be read - it starts the operation, goes idle, and fires an interrupt when done. A network card doesn't sit in a loop polling for packets - it DMAs them into memory and signals completion. Even the CPU itself - when there's no work, it halts until an interrupt arrives.
Traditional kernels paper over this event-driven reality with blocking abstractions. They create threads, schedulers, and context switches to present a synchronous programming model. Our little picokernel just gives it to you straight: reality is asynchronous.
An async runtime embraces the hardware's native behavior: submit work, go idle, wake on interrupt, process completions. No context to save because we never left. No scheduler because we're cooperative. No lock contention because we're single-threaded (for now; and my multi-core plan is lockless). The result is code that maps almost directly to what the hardware is doing.
This reminds me of Leslie Lamport's paper Computation and State Machines, where he argues that state machines are a universal framework for representing and reasoning about computation. Async runtimes are a fitting execution plane for state machines. Messages/interrupts/completions come in, we process them (along with whatever timers expired) and update internal state, we send new messages/requests out, and then wait for the next round. This isn't just a programming pattern - it's a reflection of the hardware's fundamental operation.
The clarity this brings to reasoning about the system is remarkable. Want to know what the kernel does? Look at the event loop. Want to trace a network packet? Follow it from interrupt -> completion queue -> callback. No hidden preemption points, no mysterious wakeups, no "it depends on the scheduler."
It'd be great to get this running on microcontrollers too, where I think it's a great fit. It already supports 32-bit, it does no dynamic allocation, and it doesn't depend on any standard library (only the freestanding C headers).
I'd like for this same runtime and user API to work in a "hosted" form as well. That is, same user code works on other operating systems too - iOS, Android, Linux, Windows, MacOS, BSD. (Maybe even the browser.) The platform expectations are quite minimal:
That's all quite doable, via kqueue (iOS, MacOS, BSD),
epoll (Linux, Android), io_uring (Linux), IOCP (Windows).
So the dream would be a single codebase that's totally event-driven that works across bare metal and all these operating systems.
For cloud deployment, the goal is to test on providers that allow custom kernels. Additionally, I want to build a minimal Linux image where init is just a kvmtool launch of this kernel - that way, we go anywhere Linux goes, with the full security and isolation of a nested VM.
I've also been banging around on a new C-based state machine libary that helps organize async code without language support, so I'll present that here soon.
Also, I think this runtime and maybe the state machine library would make for nice compilation targets. A C-like language with async/await, maybe something akin to Zig, could compile down to VMOS's simple async API.
Here's what on my roadmap, though we'll see how far I decide to go:
-accel hvf/kvm)If you think this is neat and would like to chat about it, please drop me a line.