Ryan Sepassi

Recent interests

Septempber 2024

C et al

The programming language C remains the core systems language across operating systems and embedded systems. Every programming language has some way to call C code, and the C application binary interface (ABI) is the lingua franca of inter-language interaction. There are several alternatives to C for many, if not all, of its use cases:

And the new(er) kids on the block:

I'm loving seeing these. I like C quite a bit - simple, understandable, controllable, good performance, stable, and tons of nice libraries - but it does have its annoying parts and it's fun to see different takes on what systems-level programming could look like. I'm more of a language hipster and so gravitate towards the new kids on the block. Zig in particular has caught my attention. But day to day, I'm mostly back to C. Maybe for the same reasons I mostly use plain sh and still work mostly in VIM. For the vibes. But also because I know they'll be around for a while in mostly unchanged form and so my brain can move on to other things.

One thing that I have yet to see in a C successor is a clean module/interface system. It seems that all of them see the C header file as superfluous now that developer machines are fast enough to slurp in the whole codebase at compile-time, but there's something about the clean separation of interface from implementation that I always liked. Not that C libraries tend to reuse headers, but a guy can dream. It sounds like the really good stuff was in ML; see Modules Matter Most by Bob Harper and Jimmy Koppel's commentary.

Optimizing compilers

For the uninitiated, computer code that programmers (or now AI systems) type out is not directly runnable by the machine. It has to go through a translation system, often a compiler. Programs directly translated to machine code are often still quite slow. That's where optimizing compilers come in. They take that initial translation and optimize it so that things run fast. The most widely used optimizing compiler is LLVM, developed by Chris Lattner initially at the University of Illinois and then Apple. It's a large C++ codebase with many thousands of contributors over its nearly 25 year history. Nearly every one of the above C successors uses LLVM to generate machine code.

I was under the impression that it would be nearly impossible to get anywhere close to LLVM's performance without at least hundreds of engineering-years of effort. But then I encountered projects like MIR, QBE, and Tilde, each of which is the result of mostly 1 engineer for 1-2 years, achieving ~60-90% of LLVM's (or GCC's) optimized performance, but in ~1% of the code and 1% of the compilation time.

Aesthetically, simple and lightweight make me happy. For release builds at industrial scale, I can understand wanting to squeeze out the very best performance. But I find these projects fascinating and I'm surprised that more of the new C-like languages don't use them, at least for debug/development builds. I think Hare is the only one that does (it uses QBE). Zig is moving towards doing its own code generation for non-release builds, but without a third-party dependency, and I believe Roc also skips LLVM for non-release builds. Nim sidesteps code generation entirely and just outputs C, which you can feed into your favorite C compiler. If I were to ever develop a C-like language, I think I'd target some internal IR that could in turn trivially target MIR, C, and WebAssembly.

In 2015, Daniel J Bernstein argued that optimizing compilers were grossly overrated since what really mattered were overall data structure and algorithm design and hot spots in code - small sections that accounted for huge fractions of the runtime - and that those were always best treated with manual care (often hand-written assembly). Most gains from optimizing compilers were tiny in comparison and the "sufficiently smart" compiler that can do all the things was a pipe dream. The rest of your code basically just doesn't matter.

In my experience, you architect things up front in a way that you think should have generally good performance for your use case, and then you start measuring things when you want to optimize them. It has almost always been the case that the things that matter are overall data movement (i.e. "architecture") and hot spots. It's hard to say that the compiler optimizations didn't matter, but I now tend to think of them as more cost-saving measures loved mostly by the datacenter ops and finance folks - easy cross-cutting wins that don't bother the developers too much. For single programs/applications/systems, you really need to get in there and do the profiling and tuning yourself.

All to say that I would expect overall performance to be better in a programming language that used faster codegen and focused its efforts on making profiling and debugging easier because it would allow for developer time to be better focused on the parts of their code that really mattered.

Somewhere in the mix here is that more and more compute cycles are better treated by and are moving towards SIMD and GPU, i.e. inherently parallel hardware. I'd like to see language features that acknowledge this shift. Both Zig and Odin include Vector types, where operations on those types will target SIMD instructions, but I'm not well-versed enough to say how well it works compared to more directly targeting SIMD.

DJB quotes Knuth's 1974 paper Structured programming with go to statements which talks about interactive compilers. I'd love to see a C successor where compilation is iterative and interactive in that way. The languages that I'm aware of where the compiler is a sort of participant in the development of the software are Idris and Lean, both based on dependent type systems.

VMs, hypervisors, and virtio

For years now, supported by the rise of cloud computing, nearly every 64-bit x86 and ARM chip (including those on mobile phones) has shipped with hardware-assisted virtualization. Instead of the chip having 2 privilege levels - one for the operating system and one for applications - they added a 3rd for the "hypervisor." This allows clouds like AWS to run their hypervisor on their machines at the highest privilege level and to host customersat the OS privilege level, basically turning the OS into an application from their perspective.

The hypervisor creates a virtual machine that the guest operating system runs on, and this machine has a standard set of devices that are defined by the virtio standard. What's particularly striking to me about this is that when considering the development of new operating systems, there were 2 hurdles to overcome:

  1. Looking "up" at applications, there was the issue of whether any applications would actually run on your operating system.
  2. looking "down" at devices, there was the issue of implementing device drivers for all the hardware out there.

What's interesting about virtio is that it significantly reduces the burden of #2. If an operating system implemented device drivers for virtio-net, virtio-blk, and virtio-rng, it would have everything it needed for a datacenter/background deployment, and with virtio-gpu and virtio-input, it would be able to serve as a user-facing device.

#1 is still a huge issue, but over the past 10 years, the theoretical barriers to developing a successful operating system have fallen significantly, especially if you consider that a huge fraction of code is actually in much smaller footprint ecosystems like nodejs and CPython, i.e. single applications that are ~cheap to port to new operating systems but that cover huge fractions of use cases.

Similarly to the case of the simpler optimizing compilers, the question becomes, what could you do with a simpler, leaner, more malleable operating system? Clearly you lose a lot leaving the worlds of Linux, MacOS, Windows, iOS, and Android, but what could you gain?

Personal computing

Computing nowadays feels rather "large-scale." Half the S&P 500 is computing. Digital (social) media is basically THE media. Digital assistants (actually useful ones) seem right around the corner. Datacenters are re-pioneering nuclear power. Our phones are always with us, connecting us to everyone and everything, including every billion and trillion dollar corporation wanting to count us as one more addition to their 999 billion hamburgers served.

I think this is largely part of the larger arcs of globalization and industrialization, and the role of information and communications technology at every stage. The written word, paper, the printing press, the telegraph, the telephone, radio, television, satellites, and now computers and the internet. Each enabled larger and larger scale coordination, communication, manipulation. And because it could be done, some folks somewhere every time decide that it should be done. So we live in a world of giants, and computing has become very much part of that world. See Marx and Freud and Chomsky and myriad others who have analyzed Civilization and its Discontents far better than I ever could.

The question I have is, when it comes to computing, does it have to be so? Giants have been running around the world for thousands of years, and while their power games have largely dictated the shape of our world and even the tenor of our days, some people - many even, if not most - have retained enough space, enough freedom, enough privacy, within our own minds and amongst friends to think, to feel, to talk, to imagine. And in these imaginings, sometimes a computer figured. In some times and in some places, the computer was imagined as a personal, even liberatory device. As personal as a notebook or diary. And of course, in many ways, our most personal computers, our phones, are even more personal than diaries -- I don't know of anyone who carries around their diary on their person at all times -- and yet, there must be some different conception of personal that computers have achieved.

Still so much is better about simple notebooks, pen, paper, index cards, spreading them out on a desk, carrying them to the coffee shop, storing them away in a drawer or closet. I read letters people used to write to each other before instant connectivity and I have no doubt that we've lost something in our connections to one another.

What would it look like for a computer to be personal? Does a personal computer really exist anymore? And what exactly does personal -- this kind of personal -- even mean? There's a computer or three on nearly every body on the planet; how much more personal can it get? Clearly I mean something different; I just don't know what yet. Maybe it's one of those things that you just really can't grok until it's right in front of you.

Does AI figure into this? I don't know. Certainly not the versions being pursued by the big places; they're all just too interested in imperial rule to make anything beautiful.

Some people and things that are interesting to me in this direction are:

I suppose if we are to discover this version of computing, we need things to be malleable, and for things to be malleable, we need to understand them, and for us to understand them, we must simplify them. If we simplify, then we can understand, and if we can understand, then we can change, and if we can change, then we can discover.

Plurality, decentralization, Mondragon

Speaking of living in a world of giants, it turns out lots of people are interested in this from different angles. Glen Weyl and Audrey Tang have written a book, Plurality, all about technology and democracy. Subtitle: The future of collaborative technology and democracy. Very cool. If the previous section on personal computing is asking "What would it mean for a computer to be humanely personal?", then Audrey and Glen are asking, "What would it mean for computing to be humanely social?" An excellent question.

Maybe it's the nature of new technology, of change, that we humans imbue it with our own hopes and dreams, with our visions for the future. Nowadays, I have a lot more respect for those who put technology in service of positive social visions instead of leading with technology with little regard to impact. Were industry and science always so amoral? Maybe so, and I just drank the American Kool-Aid when I was young.

The Mondragon Corporation is a Spanish company organized as a federation of worker-owned cooperatives. Pretty fascinating. It was founded by a Catholic priest in the at-the-time small poor town of Mondragon who figured that what his parish really needed was economic opportunity and that the path to that was through technical education and production. It worked, and Mondragon the town is still small, but no longer poor, and the Mondragon Corporation now employs nearly 100,000 people who all maintain a business culture that is pretty centered around humanist egalitarian values. Like actually Christian values. JC would approve I think. Across all cooperatives, there are limits on the ratio between the top-paid worker and the bottom-paid worker, and the companies still seem to do pretty well.

Its (early, but I think still ongoing) emphasis on technical education and production reminds me of Joe Studwell's analysis of economic development strategies in How Asia Works, and of Booker T. Washington's emphasis on technical education for the advancement of African-American people after the end of 200 years of slavery.

I would love to live in a more egalitarian decentralized humanist world. I'm not sure how to bring that about but I'm heartened that others are also trying to figure it out.


That's all for now. Fare thee well.