May 2026
We continue our jaunt up the bootstrap chain from hex0-seed to TCC. The
first post brought a portable
pseudo-ISA (P1) and a real macro layer on top of M0 (M1++); the
second delivered a Scheme
interpreter (scheme1) in ~5KLOC of P1++. The bet was that with a Scheme on
the floor, one C compiler would be enough — collapsing live-bootstrap's
ladder of three C-likes (subset C, M2-Planet, MesCC) into a single hop.
Today the bet pays out: cc.scm — a C compiler in scheme1, ~5KLOC — compiles
TCC 0.9.26 on aarch64, amd64, and riscv64. TCC then compiles itself to a
byte-identical fixed point. Five upstream projects in live-bootstrap's audit
path collapse to one (the vendored seed bytes). The per-arch swap set drops
from ~5KLOC to ~1KLOC.
Before going further, here is the whole chain, one row per stage. Each stage is a shell script in boot/; each one's outputs feed the next.
| Stage | Inputs | Outputs |
|---|---|---|
| boot0 | seed bytes (hex0-seed + 6 hex files) | hex2, catm, M0 |
| boot1 | boot0 + M1pp.P1, hex2pp.P1 |
M1pp, hex2pp |
| boot2 | boot1 + catm.P1pp, scheme1.P1pp |
catm, scheme1 |
| boot3 | boot1 + boot2 + cc.scm + tcc.flat.c + libc.flat.c |
tcc0 |
| boot4 | tcc0 + tcc.flat.c + mes libc.flat.c |
tcc1, tcc2, tcc3 (assert tcc2 == tcc3), mes libc.a, libtcc1.a |
| boot5 | tcc3 + musl-1.2.5 source | musl libc.a, crt{1,i,n}.o |
| boot6 | tcc3 + seed-kernel/ |
kernel.elf |
Posts 1 and 2 covered boot0 → boot2 (everything up through scheme1).
This post is boot3 → boot5 (the C compiler, the self-hosted TCC, then
musl). The next post will be boot6 plus the DRIVER=seed loop that
re-runs the whole chain inside the kernel boot6 produces.
cc.scm is a streaming pipeline: lexer → preprocessor → parser → codegen,
each pulling one token at a time from the previous stage. No materialized
token list, no AST. The parser emits P1++ assembly directly into a
fixed-capacity buffer as it walks the source. C99 names eight translation
phases; we collapse them into three:
NL tokens because the preprocessor needs them to delimit
directives.#define, #if,
#include-was-already-flattened (more on that below), macro expansion
with hide-sets, #if evaluated through the same
constant-expression machinery the parser uses.Three pieces of state thread through the pipeline. world holds the
scope and tag bindings, the string-literal pool, and the tentative-definition
list. pstate owns the token iterator and the current function context.
cg owns everything to do with emission — the per-section output buffers,
the value stack, the frame allocator, the label counter.
The no-AST, streaming, value-stack shape is borrowed straight from TCC
itself, which compiles in a single pass with a vtop-driven value stack
and emits machine code as the parser walks. The chain owes TCC more than
its source — its design is the model for cc.scm's codegen too. Every
expression operand becomes an opnd record (kind, type, lvalue?) pushed
on cc.scm's vstack; operators pop their inputs and push their results.
Most operands are register-resident at first; binary operators spill the
left operand to a fresh frame slot before evaluating the right, so
register pressure stays bounded by tree depth, not subexpression count.
It's the simplest codegen shape that compiles tcc.c without blowing up.
No optimization, no register allocator, no SSA. Predictable by eye, in
keeping with the rest of the chain.
Two things from the scheme1 post show up in production here. First, the
scratch heap: every top-level declaration runs inside
call-with-scratch-deep-copy, allocating tokens, expanded macros, and
per-decl parser state freely on the scratch heap and deep-copying just
the persistent symbol-table entry back to the main heap before reset.
Working memory stays bounded in the size of one declaration's worth of
junk. Second, pmatch: it ended up earning its keep in the parser, which
spends most of its time destructuring tokens by shape. The form classifier
from the prior post was not a coincidence; it was the spine of what was
coming.
Here is parse-stmt — exactly the destructure-and-dispatch from the prior
post, now driving statement parsing for real:
(define (parse-stmt ps)
(pmatch (peek ps)
(($ tok? (kind PUNCT) (value lbrace)) (parse-cstmt ps))
(($ tok? (kind KW) (value if)) (parse-if-stmt ps))
(($ tok? (kind KW) (value while)) (parse-while-stmt ps))
(($ tok? (kind KW) (value do)) (parse-do-stmt ps))
(($ tok? (kind KW) (value for)) (parse-for-stmt ps))
(($ tok? (kind KW) (value switch)) (parse-switch-stmt ps))
(($ tok? (kind KW) (value return)) (parse-return-stmt ps))
(($ tok? (kind KW) (value goto)) (parse-goto-stmt ps))
(($ tok? (kind KW) (value break))
(advance ps) (expect-punct ps 'semi) (do-break ps))
(($ tok? (kind KW) (value continue))
(advance ps) (expect-punct ps 'semi) (do-continue ps))
(($ tok? (kind KW) (value case)) (parse-case-stmt ps))
(($ tok? (kind KW) (value default)) (parse-default-stmt ps))
(($ tok? (kind IDENT))
(guard (and (eq? (tok-kind (peek2 ps)) 'PUNCT)
(eq? (tok-value (peek2 ps)) 'colon)))
(parse-labelled-stmt ps))
(else
(cond ((stmt-starts-decl? ps) (parse-local-decl ps))
(else (parse-expr-stmt ps))))))
The parse-if-stmt arm is the recursive heart in miniature — parse the
condition expression to leave its value on the vstack, then ask the codegen
for an if/else skeleton with the two arms supplied as thunks the codegen
calls between emitting the branches:
(define (parse-if-stmt ps)
(expect-kw ps 'if)
(expect-punct ps 'lparen)
(parse-expr ps) (rval! ps)
(expect-punct ps 'rparen)
(cg-ifelse (ps-cg ps)
(lambda () (parse-stmt ps))
(lambda ()
(cond ((at-kw? ps 'else)
(advance ps) (parse-stmt ps))
(else #t)))))
Parser supplies the what; codegen supplies the how (label allocation,
branch emission, frame-slot reuse). They share state through cg but neither
walks the other's data structures. The whole compiler is shaped like this:
small functions, value-stack discipline, recursion mirroring the grammar.
The target is just enough C to compile
tcc-0.9.26-1147-gee75a10c/tcc.c
with the same defines that live-bootstrap's tcc-mes step uses:
BOOTSTRAP=1, HAVE_LONG_LONG=1, ONE_SOURCE=1, TCC_TARGET_X86_64=1,
inline=, and the usual configuration paths. Notably not defined:
HAVE_FLOAT, HAVE_BITFIELD, HAVE_SETJMP. Each of those gates off entire
code paths in tcc.c; we don't have to compile any of it.
That sets the lower bound. The upper bound is what MesCC accepts, since
we're its replacement. Anything MesCC strips silently — const, inline,
__attribute__ — we strip silently too.
What's actually in: integer types and arithmetic, pointers and arrays,
struct/union with anonymous members, enums, typedefs, function pointers,
all the control-flow forms, designated initializers, sizeof, offsetof,
variadic functions, sret for struct returns >16 bytes, the C11 ternary
common-type rule for ?:. Single translation unit — the parser sees one
big bytestream, with #include resolved by a pre-flatten pass upstream.
P1-64 only (LP64) for now. Floats parse as types
for ABI sizing but every operation emits integer bitpattern arithmetic.
"Sufficient for tcc.c at this point in time" is the design.
The end-to-end flow, on one C function:
int sum(int n) {
int acc = 0;
for (int i = 1; i <= n; i++) acc += i;
return acc;
}
cc.scm emits a function body wrapped in libp1pp's %fn macro, which
opens a hex2++ scope, allocates the frame, and emits %enter / %eret.
The body is vstack-style spills and reloads — every operand goes through a
frame slot:
%fn(sum, 32, {
%st(a0, sp, 0) # spill incoming n
%li(t0, 0) # acc = 0
%st(t0, sp, 8)
%li(t0, 1) # i = 1
%st(t0, sp, 16)
.scope
:.top
%ld(t0, sp, 16) # i
%ld(t1, sp, 0) # n
%ble(t0, t1, &.body) # if i <= n goto body
%b(&.end)
:.body
%ld(t0, sp, 8) # acc
%ld(t1, sp, 16) # i
%add(t0, t0, t1)
%st(t0, sp, 8) # acc += i
%ld(t0, sp, 16) # i++
%addi(t0, t0, 1)
%st(t0, sp, 16)
%b(&.top)
:.end
.endscope
%ld(a0, sp, 8) # return acc
})
Every named bit of P1++ here came from earlier posts. %fn, %enter,
%eret, %ld, %st, %add, %addi, %ble, %b are libp1pp macros
expanding through the per-arch backend; :.top / :.end are dotted local
labels resolved by hex2++'s innermost-out scope walk; the scope itself is
opened by %fn and the inner .scope for the loop. cc.scm's codegen has
no idea what aarch64 or amd64 looks like — it generates the same P1++ on
every arch, and the backend lowers it.
With cc.scm in hand, the chain reaches its first C binary in boot3:
scheme1 cc.scm tcc.flat.c → tcc.P1pp # cc.scm reads C, emits P1++
m1pp + hex2pp + ELF prelude → tcc0 # boot3 ends here
tcc.flat.c is a single bytestream: upstream tcc-0.9.26 plus a small set
of patches, with #includes recursively resolved by a pre-flatten pass
(written in scheme1 itself). The compiler proper sees one TU.
Then comes the self-host bounce in boot4:
tcc0 tcc.flat.c → tcc1
tcc1 tcc.flat.c → tcc2
tcc2 tcc.flat.c → tcc3
assert tcc2 == tcc3 byte-for-byte
Why three bounces and not one? cc.scm and tcc's own codegen produce
different — but both correct — code for the same source, so tcc0 and
tcc1 differ in their bytes (same behavior, different codegen choices).
tcc1 and tcc2 are both produced by tcc, but tcc1 itself was a
cc.scm-shaped binary, so its codegen choices when emitting tcc2 need
one more bounce to reach the tcc-shaped fixed point. tcc2 == tcc3 is the
audit point: tcc compiling itself with no help from cc.scm reaches a stable
image, and any future build through this stage will produce the same
bytes. The loop has closed.
The whole sequence — a few hundred bytes of hex0-seed, no host C compiler
at any rung — assembles unchanged on aarch64, amd64, and riscv64. The
P1 portability move from the first post is still doing its job.
tcc3 is a real C compiler but mes-libc is a sketch — enough to host tcc
itself and not much more. boot5 trades it in for the real thing: musl
1.2.5, built by tcc3 with about 1300 source files driven through the
scheme1 driver one tcc -c at a time, archived into libc.a with
crt{1,i,n}.o alongside.
musl is written for GCC and Clang, not TCC, and it
leans on a handful of GCC extensions TCC doesn't accept: register-asm
variables for syscall stubs, __attribute__((alias)) for weak references,
_Complex for the math library, x86_64 SSE/x87 inline asm. We patch
around each of these during source prep — the patch list lives in
bootprep/musl-vendor.sh — replacing the syscall stubs with TCC-shaped
inline asm, alias attributes with explicit thunk functions, _Complex
math with separate-real-imaginary alternatives, and the x86_64 SIMD bits
with their generic C equivalents. Most are a few lines each; the patch
set is small enough to read.
From boot5 onward, every binary the chain builds can link against musl
and use the real printf / malloc / pthread_create (the last only
formally — we still don't run threads). mes-libc retires.
Lines of code, comments and blanks stripped, seed → tcc 0.9.26 window:
| Scope | boot2 | live-bootstrap | reduction |
|---|---|---|---|
| Single arch, hand-written | 21,468 | 30,813 | 30% |
| Single arch, incl. generated tables | 22,047 | 34,624 | 36% |
| 3 arches, hand-written | 22,695 | 41,053 | 45% |
| 3 arches, incl. generated tables | 24,434 | 44,864 | 46% |
| Per-arch swap set (3-arch sum) | 3,649 | 15,595 | 77% |
The single-arch number is real but modest. The per-arch number is the
load-bearing one: ~1.2KLOC per arch in boot2 vs ~5.2KLOC per arch in
live-bootstrap. The reason is structural. live-bootstrap's first C compiler
above the M0 macro stage is cc_<arch>.M1 — a hand-written, arch-specific
implementation of M2-Planet (a C-subset compiler) in M1 macro assembly,
3,800–5,800 lines per architecture. Each new arch is a from-scratch port
of that compiler.
boot2 has no equivalent because P1, the portable pseudo-ISA introduced in
the first post, sits immediately above M0. Everything above the per-arch
P1 backend (P1-<arch>.M1pp, ~500–700 lines) — M1++, hex2++, scheme1, the
prelude, cc.scm itself — is single-source, compiled per arch by swapping
the backend file. There is no per-arch C compiler because there's no C
compiler that stops at the bottom of the chain; the only one that exists is
the portable Scheme one, all the way up.
Distinct upstream projects in the audit path also collapse: live-bootstrap
walks five (stage0-posix + per-arch cc_<arch>.M1, mescc-tools, M2-Planet,
mes, nyacc); boot2 walks one (stage0-posix's vendored seed bytes only).
Both projects ship the same opaque hex0-seed bytes, the same hex/M0
ladder up to M0, the same tcc 0.9.26 source. Everything between is
ours.
The comparison is not entirely apples-to-apples because live-bootstrap uses
some of the artifacts from its intermediate Mes stages downstream, whereas
boot2's code is used exclusively for the bootstrap (though scheme1 could be
used as an early shell and associated tools).
Vendored seed bytes (identical between projects) and the tcc 0.9.26
source itself are excluded from all totals, on both sides. Full breakdown
in
docs/COMPARE-LIVEBOOTSTRAP.md.
The work that live-bootstrap did on patching tcc and providing a minimal libc for the pre-musl build was invaluable, and we use both with some cleanup.
Reaching C is the moment the bootstrap exits the museum. Below this rung we've been doing curatorial work — preserving hex/M0 because they're the seed, making assembly portable with P1 and ergonomic with M1++, and building a a Lisp to gain purchase on building complex programs. Above it, we step into the language modern software is actually written in. Linux, qemu, sqlite, git, the kernel that runs your phone, the C library every other language calls into eventually — all C. You can't bootstrap a working software stack without it, and there is no other language with the same combination of small spec, total hardware reach, and ecosystem stability. The chain has to terminate in C because the world terminates in C.
C earns its place not because it's elegant but because it's the contract everything else honors. A couple folks at Bell Labs in the early seventies sat down to write an OS in a high-level language portable enough to move Unix from a PDP-7 to a PDP-11; what came out was a language whose abstract machine is so close to the physical one that fifty years of CPUs have stayed close to it rather than the other way around. Pointers, structs, calling conventions, the linker model — all of these are now load-bearing for languages that don't even resemble C on the surface. To bring a system up from nothing without ever writing C is to bring up something that can't talk to the system that already exists. So we write C, and we keep its compiler in the chain.
Cheers to Dennis Ritchie for C, and Fabrice Bellard for TCC.
Up to here, the chain has been hosted on Linux. Every binary along the way — hex0 through tcc through musl — has been a Linux user-mode process trusting the host kernel and userland to load it, run it, and feed it files. The seed is small; the runtime envelope is not.
The next post takes that envelope away. We use tcc3 to compile a tiny
kernel — ~1300 lines of portable C plus a few hundred per arch — small
enough to satisfy the bootstrap's own minimal OS contract: roughly a dozen syscalls
to build and drive. Boot it under qemu with the chain's inputs on a
virtio-mmio disk, let the chain re-run inside its own kernel, and pull the
outputs off a second disk on shutdown. Bit-equivalent to the host-driven
build. The loop closes.