That Time I Tried Porting Zig to SerenityOS
This was a talk I gave at the Software You Can Love 2022 conference. You can find the slides right here (PDF).
Today I want to tell you a story about the time I ported the Zig compiler to SerenityOS.
I’m going to tell you a bit about what my motivation was to do this. Then I’ll explain the steps I took and the various challenges I faced (there were a lot of those!). We’re going to go into the details of some of the problems that I had to deal with. And finally, I’ll finish it off with my general thoughts… and maybe a little surprise.
So why port a new programming language compiler to a new operating system? Well, the first (and obvious) reason is that it’s just fun to do so. I am sure that many people in the audience have their own fun hobby projects. I have my own projects too, of course; but at least for me, it’s hard to do the same project for a long time without mixing things up a bit, and I sometimes take a couple week-long breaks from my personal projects and go contribute to other projects.
The second reason that I wanted to port Zig to SerenityOS is that I wanted to see ZigSelf working on it. ZigSelf is my own implementation of the Self programming language. I’ve been working on it on-and-off for the past year or so, and I wanted to see how it would fare on SerenityOS.
SerenityOS
Now, I’ve been using the word SerenityOS a lot, but how many people in the audience know what it is? (PS: most people did.)
SerenityOS is a hobby operating system developed by Andreas Kling. The project started in 2018 after Andreas got out of rehabilitation as a way to keep himself occupied, and over time grew to hundreds of contributors.
In SerenityOS, everything is from scratch - the kernel, the GUI framework, the core system utilities, a browser engine with JavaScript (small feat I know), everything. The codebase is extremely approachable, and everything is in one repo - the entire operating system is within your reach. This makes it an exciting project to hack on, since you can grep through the codebase from the shell all the way to the kernel. Andreas himself regularly posts hacking videos where he demonstrates how he adds a new feature or fixes an issue, which are both entertaining to watch and very educational at the same time.
I first got interested in SerenityOS back in April 2021, when Andreas announced the start of a Discord server. Now, say what you will about Discord, but the new communication medium was a lot more engaging and soon I found myself contributing various fixes. Although I’m not as active a contributor anymore (mostly due to real life obligations and other projects of mine), I still contribute from time to time, and it’s still a project that I really enjoy just opening and exploring the new things every couple weeks.
So back in February 2022, I wanted to see how my project ZigSelf would run on SerenityOS. Only one problem…
There is no Zig target for SerenityOS!
I was officially yak-baited.
Humble Beginnings
So, what do you need in order to run Zig binaries on an operating system? Well, the easiest way is to just write a target and then compile your binaries for that target… but that’s boring, right? So I wanted to actually be able to compile my project directly within SerenityOS and also contribute any fixes that SerenityOS needed as well, improving the POSIX compatibility of the system.
As most of you already know, Zig currently relies on LLVM for the actual binary generation in its default mode. This means that if you want to run Zig on an operating system and have it produce binaries, you are going to need LLVM working on that system too. This is certainly a barrier for entry for a new operating system. (Note that this requirement is going to going away with the self-hosted backends in stage2, but there’s still some way to go for those to become usable for daily driving - in particular, the x86_64 backend can’t build my project yet, and back in February when I tried this it was nowhere near where it is today.)
Thanks to the work of Daniel, Andrew Kaster, and many other great contributors, the hard part was already done for me - LLVM already worked on SerenityOS, and there was even a working clang port. This meant that the only part that I actually needed to handle was the Zig part.
Okay, we’ve already got LLVM, but what else does the Zig compiler need to work on a new operating system? Well, it needs a Zig compiler.
Wait what?
Well, in order to compile Zig code to an appropriate target, you need a Zig compiler that’s able to compile for that target. In my case, I had no chicken and no egg, since this is the first time (that I know of) someone had tried to run Zig on SerenityOS.
Thankfully, there’s this very useful project by one Andrew Kelley called zig-bootstrap. What zig-bootstrap does is that it builds a host LLVM, a host Zig using that LLVM, and then a target LLVM and Zig for the target that you specify. It allows you to build Zig for a target with nothing but a C++ compiler and build tools. This project became the basis of my efforts. Huge thanks to Andrew for paving the way for it.
Making zig cc
Work
Earlier, I mentioned that in order to build Zig binaries for a target, you need, well, a Zig target. If you’re using the LLVM backend, you are also going to need an LLVM target that matches your Zig target. Since there was already a clang port for SerenityOS, I already had the necessary patches for a clang SerenityOS driver. But I didn’t want to write a whole bunch of target code without verifying that the Zig compiler even was able to work on SerenityOS.
One of zig’s killer features is the bundled C/C++ compiler when you build it with LLVM, courtesy of the Clang compiler. My first idea was to use this compiler to build a simple C program, which would prove that stage1 could be built (at the time, stage2 was still quite a ways away).
So my goals were as follows: I would build a host Zig compiler that’s capable of building files for Serenity, build a simple C program and test that it works, and then get to compiling Zig itself for SerenityOS once I confirmed that it was possible to do so.
Patching Non-Monorepo LLVM
The first problem I hit was before I even started compiling, actually. Back in February, the LLVM patches that SerenityOS uses to make the necessary changes were a single, large patch, intended to be applied on the LLVM monorepo. However, since zig-bootstrap does not use the monorepo layout of LLVM, this caused a problem. However, Daniel quickly split those patches up for me and gave me a set of patches such that I could apply in parts. Thanks Daniel!
Once that was done, host LLVM compilation was pretty uneventful, just a lot of waiting.
Synchronizing Zig With LLVM
Next up was compiling host Zig, where I hit a couple more issues.
First up, Zig really doesn’t like to be out of sync with LLVM’s enums, and there are various static assertions within the C++ code which verify that the ZigLLVM enums match the LLVM enums exactly. This is actually a really nice thing, because once you add a new value to those enums, the compiler will guide you by telling you where you should be inserting that new enum values in various places in the compiler.
Of course, adding things to the C++ parts means that you have to synchronize Zig
code with C++ too. At the time (and today still), the Zig compiler is still a
hybrid between the bootstrap C++ code and the newer self-hosted implementation
in Zig, which means some manual synchronization work is required to keep
everything running smoothly. Adding serenity
to Target.Os.Tag
will guide you
on where this new value isn’t handled.
A Maze of CRT Objects, All Different
As part of “fix everything that the compiler complains about” ordeal, I came across this bit of code. This code tells the Zig compiler about which C startup objects to link the executable or library against when cross-compiling to another target.
Now, there’s a lot of details here that nobody except those weird people who develop compilers need to know about, but the gist of it is that this is different for every single operating system and varies across executable types. For SerenityOS, the patches we use for Clang seem to automatically figure out linking crtbegin and crtend, so all I had to really do was get rid of those and fill it out based on what we had available in /usr/lib once the system had been built. It seems to work and I plan on NOT touching it ever again.
The Ununited State of LibC
So once all the compiler errors were fixed, I had a host Zig built and ready to
build programs with. So I wrote a hello world C program, pointed zig cc
at it,
and… linker errors!
Right. In order to build a C binary that links to SerenityOS’ libc, we need to know what symbols libc itself needs. As SerenityOS’ libc was (for some reason) statically linked, it can’t tell which libraries it needs on its own; therefore it was mostly trying it again and again until it stopped erroring out.
In the end, I had to link in pthreads, libdl, libm and libunwind for the C library to work.
Oh, and libc++ too.
Yeah… Serenity is fundamentally a C++ project and it shows. The C standard library itself uses C++’s runtime type information in order to link properly. It’s unfortunate, but it works, so it’s fine.
Oh, and I should mention this: As of just a couple weeks ago, this issue has been resolved thanks to the awesome work by Tim Schumacher, who merged in all those libraries into libc, which makes the serenity target actually need the least flags to link with libc now. Thanks Tim!
So It Built…
Okay, so the C binary built, and it looks like a valid C binary. Let’s put it on the disk image and run it and… welp.
This is an issue with SerenityOS. Basically, when you load in a program, your
program first has to load in /usr/lib/Loader.so
as its interpreter, which
relocates your sections, sets up all your dynamic libraries and then executes
you. The problem is the fact that static PIE binaries also need the relocation
section, but they don’t want to use the loader (as Daniel put it, they’re “the
edgy version [of static binaries]”).
Of course, what’s the best way to make Zig output a dynamic binary? Why, link a dynamic C++ library of course. After doing that, this beautiful line…
generated this beautiful binary…
and here’s a Zig-built C binary running on SerenityOS!
Oh, and the proper fix for the dynamic binary thing is using the designated function for it in the Zig compiler, which I would learn and then promptly facepalm.
The Big Guns
Okay, so we’ve confirmed that Zig can indeed build a binary that can work on SerenityOS. But we haven’t actually built any Zig code, that was just C code that leveraged Clang. But what do we actually need in order to build Zig code?
Well, for things like printing to the screen or allocation to work, the built binary needs to know how to talk to the operating system. Zig uses two approaches for this. The first is to actually write how to syscall and its parameters directly within the compiler; this is the approach Zig has taken with Linux as it has a syscall stability guarantee. But Linux is the outlier here; for almost every other operating system, Zig will link in the C library and then use functions defined there as the stable system call interface. This is the best you can do, since many operating systems (including SerenityOS) only guarantee stability through libc. So when the time came to build Zig for the SerenityOS target, I would need to tell it the necessary things about the OS in order for the compilation to succeed.
The three components that are of concern when building Zig for a target are
Zlib, LLVM and Zig itself. So it was time to start actually building big
programs with zig cc
.
Building zlib And LLVM
Building zlib was mostly uneventful, save for a little hack needed at the time
which required us to force zlib to build with -fPIC
(as some non-relocatable
code still sneaked in somehow). This is fixed with recent Serenity Clang
patches.
Now it was time for target LLVM… though it wasn’t as easy to make it get
going. First off, it couldn’t find compress2
from zlib. Hmm, why could this
be? We had some brainstorming on the Discord, and tried building with
CMAKE_FIND_ROOT_PATH
pointed to the out directory for the target, but it still
didn’t work.
Or, well, it would’ve, if I hadn’t accidentally forgotten a backlash on the line above that flag. I love shells.
Moving on… Now cmake can’t figure out that we have dlopen
and dladdr
. This
is the same “libc needs the rest of the world” issue that we discussed earlier,
and was resolved by making Zig also link those in.
But beyond those minor setbacks, zlib and LLVM were rather easily built. Now it was time for the fun part!
The Holy Yak
Now we’re at the final stage before we can start trying out the Zig compiler on SerenityOS. For the most part, this is just a variant of the host Zig compiler; the target Zig compiler needs code from std.os in order to be built, which just means adding the things it tells me are missing.
But how do we actually add a new OS to the Zig standard library? Earlier I
mentioned that you either implement the syscalls directly or you go through
libc. Since SerenityOS’ stable (well, “stable”) syscall interface is libc, this
means adding .serenity
to this struct right
here
in std.c
, and std will take care of it for me. Doing this actually gives me
more info about what kind of things Zig expects from a std.c.xyz
struct.
For an OS-specific struct in std.c
, you’re mostly expected to just list all
the various libc constants defined by POSIX in various structs, grouped by the
constant prefix. Take PROT
for example. Each of the values in the struct
correspond to one constant defined by POSIX. One notable exception to this is
errno
numbers, which is actually an enum. Beyond this, you’re also expected to
describe a couple of structs in pthreads, a few extern function declarations,
and if you built with debug mode then you’re also expected to supply something
that can walk through the list of shared objects - for Serenity this is
dl_iterate_phdr
like on Linux.
But there’s one special definition you must supply, and it’s quite special
indeed. And it’s our good old friend errno
! errno
is a delightful legacy
piece of POSIX. It’s basically a global value that gets written to whenever
something unexpected happens as a result of a libc function call. The problem
is, this obviously won’t work when you have more than one thread, so you need to
somehow keep them separate. The solution most operating systems have come up
with is making errno thread local, and then providing its location as a pointer
via an __errno_location
function. Most OS modules in std.c
do this.
While errno
was a thread local in SerenityOS, it was still a public symbol
from libc and there was no __errno_location
, which I worked around by creating
one. However, this didn’t stop me from getting a bizarre size 0 symbol which
crashes DynamicLoader
, so I just removed that check and it works, which is
fine I suppose. This has since been improved by making __errno_location
the
default way to get errno in SerenityOS.
After all that and implementing a directory iterator, we went through Semantic Analysis, and were at link point! But you know better than to believe it’s just gonna link first try, right?!
“Implementing” POSIX Support
So, of course, we had link errors… and this time, we were actually missing functions! Alright, let’s “implement” them by just slapping some stubs in libc. What could go wrong?
Well, after this and chasing a few more bizarre loader related errors, we finally got the Zig help printing!
Though, the Zig help being printed doesn’t actually prove that Zig can build stuff, so let’s build a hello world again!
Oops, Non-Working Semaphores
Hmm, okay, so it’s taking a while… it’s taking a good while actually. It probably shouldn’t take this long to start. Let’s profile it and see what it’s stuck on…
Okay then, we’re stuck spinning on POSIX semaphores, I guess that’s broken. Let me make target Zig build in single-threaded mode per Andrew’s suggestion (Andrew was in the chat at this point), and it seems that we’re getting somewhere!
(Sidenote, building the Zig compiler single-threaded is currently broken in stage2.)
Actually Implementing POSIX Support
Or, well, we were gonna get somewhere, but here’s where my earlier decisions bite me. Turns out those functions I stubbed are actually needed by Zig… so let’s implement them in the kernel really quickly. While I was doing so, I hit this interesting error which is apparently a first.
Since SerenityOS didn’t have a way to get the clock resolution at the time, I wrote the best possible implementation.
Anyway, most of the work here was implementing a few more of those *at
functions, so not much to say here.
Hmm… Weird Crash
Alright, now I’m getting this weird crash in init
, which is stage1 C++ code…
I wonder what’s up with that. And the fun part with this crash is that if you
change the location or the arguments of any of the prints, the crash location
changes!
At this point it was very late at the night to be debugging weird crashes, so I came back the next day and continued debugging. This was mostly a matter of adding debug prints in places and seeing what happened. This went on for a while until I hit a very peculiar error.
Ignore the silly debug messages, at this point I was pretty bored. The
interesting part is the fact that randomly, I started printing some unrelated
strings! In fact, it was a very specific string called
target_specific_features
, defined near the start of the init
function. A bit
more debug prints, and it looked like we were getting that broken string right
after stage2_version
was being called.
Now, what’s so special about stage2_version
? As I’ve mentioned earlier, the
Zig compiler is still a hybrid of C++ and Zig. However, this doesn’t only mean
that Zig code calls into C++; it also means that C++ code can call into Zig! To
facilitate this, there are various function declarations with their definitions
in C code, and the calling convention for those functions is set to C in order
to have the ABI match. Now, what happens if that ABI doesn’t perfectly match?
Unfortunately, you don’t get a nice compile error, you get weirdness like this.
In order to test what’s actually supposed to be generated, I wrote
stage2_version
in C++ and compiled it with a C++ compiler for Serenity. Here’s
what that looks like:
00001430 <stage2_version>:
1430: 55 push %ebp
1431: 89 e5 mov %esp,%ebp
1433: 8b 45 08 mov 0x8(%ebp),%eax
1436: c7 00 00 00 00 00 movl $0x0,(%eax)
143c: c7 40 04 09 00 00 00 movl $0x9,0x4(%eax)
1443: c7 40 08 00 00 00 00 movl $0x0,0x8(%eax)
144a: 5d pop %ebp
144b: c2 04 00 ret $0x4
144e: 66 90 xchg %ax,%ax
And here’s the version that Zig builds. Notice any difference?
02e31180 <stage2_version>:
2e31180: 31 c0 xor %eax,%eax
2e31182: ba 09 00 00 00 mov $0x9,%edx
2e31187: 31 c9 xor %ecx,%ecx
2e31189: c3 ret
After posting this and with help from some C++ wizards on Discord, I learned that…
Thou Shalt Not Pass (Structs Through Registers)!
At least on i386.
The SystemV ABI for i386 makes this clear. If you’re going to return a struct,
you have to use the address of the struct and place it on %eax
, no way
around it. Even if it fits into registers. This is apparently called an sret
.
I didn’t know enough about Zig compiler to fix this properly, but my “fix” (which consisted of inserting an x86 ABI check into an x86_64 function in the compiler) seemed to work, and we’re now getting some compile errors on the Zig side!
LibC Stuff
Now that the Zig compiler was working, the only things to fix were those
reported by the compiler. I had to implement various things I skipped out on
such as the faccessat
syscall and debugging support. I also had to stub out
flock
because we didn’t support file lock downgrading at the time. Finally, I
needed to tell Zig about where our libc was because the necessary checks were
not implemented yet.
And with all of that, we have a Zig-compiled program running on SerenityOS.
(Though only with the compiler built in release mode, debug mode was causing an integer overflow panic but details, details.)
Just A Couple More Hacks…
Now that I was able to run a basic Zig program, I wanted to see if I could complete my original goal of running ZigSelf on SerenityOS too. I actually caught a few “used usize interchangeably with u64” errors while doing this since I was running on i386, but eventually that worked too.
Aftermath
After all of this, these changes unfortunately didn’t get upstreamed into Zig and there currently is no Zig port for SerenityOS. The main reason is that at the time, stage2 was “very close” (yeah, right), and also due to the various things I had to hack or patch out didn’t allow for it to be merged, and so I abandoned the port; however, I had always planned to complete it once stage2 was officially ready for use.
Now For Something Different…
So, in the passing months, thanks to the tremendous work by the Zig team and all the contributors’ help, stage2 became the default compiler. Where does that leave us? Well, I have something to show you.
(Here I had a demo showing a build of the self-hosted Zig compiler running on SerenityOS, ported again in October, building a simple hello world, then building ZigSelf, and finally (attempting to) build the Zig compiler itself.)
So this is actually the self-hosted Zig compiler running on SerenityOS. Again,
thanks to the work of many SerenityOS and Zig contributors, this port was
actually surprisingly easy to do once again and only took a couple of days (and
most of the second day was chasing a nasty kernel bug). It’s surprisingly usable
in conjunction with the C++ syntax highlighting you saw with TextEditor
too. I
believe that SerenityOS could be a great platform to work on Zig with enough
work.
Future Work
So what’s next for SerenityOS on Zig? Obviously, I want to upstream this and
make it as simple as running ./package.sh
and waiting. Though there are some
steps we must take to get there.
First things first, we need to upstream our LLVM patches from the SerenityOS side. As far as I’m aware, they’re mostly ready, but there are still some issues that need to be addressed. Without this, I cannot upstream the Zig patches, because as I mentioned near the start of my porting efforts, Zig’s LLVM integration is strongly tied to the target triples, and now that some of that code is in Zig too, I don’t think I can ifdef it out like before anymore.
Once that’s done, we need to find a solution to the problem of SerenityOS’ libc
not being that stable. The names are the same, but things like errno numbers can
change often. After doing some brainstorming in the Serenity Discord, we came up
with the idea of a program which uses Zig’s @cImport
superpowers to import all
the constants and then provide them with a package to std.c.serenity
, and then
provide that package during the SerenityOS Zig build. So the… let’s say “less”
changing parts of LibC could be part of Zig upstream, and the things that need
to change more frequently can be provided with Serenity’s ports system.
After doing both of those, I want to explore how we could provide access to Serenity’s system libraries and IPC from Zig. As SerenityOS’ libraries are all in C++, I’m gonna have to write some bindings for this, but maybe an alternative is possible by parsing the IPC files instead.
Closing Thoughts
I want to finish up with some thoughts. Obviously, the story that I’ve told you is nowhere near the scale of something like Zig and SerenityOS, but I believe that it captures at least a bit of that spirit. At the start, the task seemed insurmountable to me - I mean, how would I even go about getting Zig running on SerenityOS? “It’s gonna take weeks”, I thought. But it turns out, the reason it wasn’t done was just because nobody has tried it. So if there’s a lesson to learn from this story, it’s that any task is achievable (be it an operating system, a new programming language, a browser from the scratch) with enough yakshaving!
Thanks
I’d like to thank all of the people on the slide for their help along the journey, and specially Daniel Bertalan, Ali Mohammad and Andrew Kaster for their emotional support during the final hours, Andreas Kling for creating the amazing SerenityOS project, and Andrew Kelley for creating the wonderful Zig programming language. Thanks everyone.