That Time I Tried Porting Zig to SerenityOS

Keywords: #serenity #zig #talk

This was a talk I gave at the Software You Can Love 2022 conference. You can find the slides right here (PDF).


Today I want to tell you a story about the time I ported the Zig compiler to SerenityOS.

I’m going to tell you a bit about what my motivation was to do this. Then I’ll explain the steps I took and the various challenges I faced (there were a lot of those!). We’re going to go into the details of some of the problems that I had to deal with. And finally, I’ll finish it off with my general thoughts… and maybe a little surprise.

So why port a new programming language compiler to a new operating system? Well, the first (and obvious) reason is that it’s just fun to do so. I am sure that many people in the audience have their own fun hobby projects. I have my own projects too, of course; but at least for me, it’s hard to do the same project for a long time without mixing things up a bit, and I sometimes take a couple week-long breaks from my personal projects and go contribute to other projects.

The second reason that I wanted to port Zig to SerenityOS is that I wanted to see ZigSelf working on it. ZigSelf is my own implementation of the Self programming language. I’ve been working on it on-and-off for the past year or so, and I wanted to see how it would fare on SerenityOS.

SerenityOS

Now, I’ve been using the word SerenityOS a lot, but how many people in the audience know what it is? (PS: most people did.)

SerenityOS is a hobby operating system developed by Andreas Kling. The project started in 2018 after Andreas got out of rehabilitation as a way to keep himself occupied, and over time grew to hundreds of contributors.

In SerenityOS, everything is from scratch - the kernel, the GUI framework, the core system utilities, a browser engine with JavaScript (small feat I know), everything. The codebase is extremely approachable, and everything is in one repo - the entire operating system is within your reach. This makes it an exciting project to hack on, since you can grep through the codebase from the shell all the way to the kernel. Andreas himself regularly posts hacking videos where he demonstrates how he adds a new feature or fixes an issue, which are both entertaining to watch and very educational at the same time.

I first got interested in SerenityOS back in April 2021, when Andreas announced the start of a Discord server. Now, say what you will about Discord, but the new communication medium was a lot more engaging and soon I found myself contributing various fixes. Although I’m not as active a contributor anymore (mostly due to real life obligations and other projects of mine), I still contribute from time to time, and it’s still a project that I really enjoy just opening and exploring the new things every couple weeks.

So back in February 2022, I wanted to see how my project ZigSelf would run on SerenityOS. Only one problem…

There is no Zig target for SerenityOS!

I was officially yak-baited.

Humble Beginnings

So, what do you need in order to run Zig binaries on an operating system? Well, the easiest way is to just write a target and then compile your binaries for that target… but that’s boring, right? So I wanted to actually be able to compile my project directly within SerenityOS and also contribute any fixes that SerenityOS needed as well, improving the POSIX compatibility of the system.

As most of you already know, Zig currently relies on LLVM for the actual binary generation in its default mode. This means that if you want to run Zig on an operating system and have it produce binaries, you are going to need LLVM working on that system too. This is certainly a barrier for entry for a new operating system. (Note that this requirement is going to going away with the self-hosted backends in stage2, but there’s still some way to go for those to become usable for daily driving - in particular, the x86_64 backend can’t build my project yet, and back in February when I tried this it was nowhere near where it is today.)

Thanks to the work of Daniel, Andrew Kaster, and many other great contributors, the hard part was already done for me - LLVM already worked on SerenityOS, and there was even a working clang port. This meant that the only part that I actually needed to handle was the Zig part.

Okay, we’ve already got LLVM, but what else does the Zig compiler need to work on a new operating system? Well, it needs a Zig compiler.

Wait what?

Well, in order to compile Zig code to an appropriate target, you need a Zig compiler that’s able to compile for that target. In my case, I had no chicken and no egg, since this is the first time (that I know of) someone had tried to run Zig on SerenityOS.

Thankfully, there’s this very useful project by one Andrew Kelley called zig-bootstrap. What zig-bootstrap does is that it builds a host LLVM, a host Zig using that LLVM, and then a target LLVM and Zig for the target that you specify. It allows you to build Zig for a target with nothing but a C++ compiler and build tools. This project became the basis of my efforts. Huge thanks to Andrew for paving the way for it.

Making zig cc Work

Earlier, I mentioned that in order to build Zig binaries for a target, you need, well, a Zig target. If you’re using the LLVM backend, you are also going to need an LLVM target that matches your Zig target. Since there was already a clang port for SerenityOS, I already had the necessary patches for a clang SerenityOS driver. But I didn’t want to write a whole bunch of target code without verifying that the Zig compiler even was able to work on SerenityOS.

One of zig’s killer features is the bundled C/C++ compiler when you build it with LLVM, courtesy of the Clang compiler. My first idea was to use this compiler to build a simple C program, which would prove that stage1 could be built (at the time, stage2 was still quite a ways away).

So my goals were as follows: I would build a host Zig compiler that’s capable of building files for Serenity, build a simple C program and test that it works, and then get to compiling Zig itself for SerenityOS once I confirmed that it was possible to do so.

Patching Non-Monorepo LLVM

The first problem I hit was before I even started compiling, actually. Back in February, the LLVM patches that SerenityOS uses to make the necessary changes were a single, large patch, intended to be applied on the LLVM monorepo. However, since zig-bootstrap does not use the monorepo layout of LLVM, this caused a problem. However, Daniel quickly split those patches up for me and gave me a set of patches such that I could apply in parts. Thanks Daniel!

Once that was done, host LLVM compilation was pretty uneventful, just a lot of waiting.

Synchronizing Zig With LLVM

Next up was compiling host Zig, where I hit a couple more issues.

First up, Zig really doesn’t like to be out of sync with LLVM’s enums, and there are various static assertions within the C++ code which verify that the ZigLLVM enums match the LLVM enums exactly. This is actually a really nice thing, because once you add a new value to those enums, the compiler will guide you by telling you where you should be inserting that new enum values in various places in the compiler.

Of course, adding things to the C++ parts means that you have to synchronize Zig code with C++ too. At the time (and today still), the Zig compiler is still a hybrid between the bootstrap C++ code and the newer self-hosted implementation in Zig, which means some manual synchronization work is required to keep everything running smoothly. Adding serenity to Target.Os.Tag will guide you on where this new value isn’t handled.

A Maze of CRT Objects, All Different

As part of “fix everything that the compiler complains about” ordeal, I came across this bit of code. This code tells the Zig compiler about which C startup objects to link the executable or library against when cross-compiling to another target.

Now, there’s a lot of details here that nobody except those weird people who develop compilers need to know about, but the gist of it is that this is different for every single operating system and varies across executable types. For SerenityOS, the patches we use for Clang seem to automatically figure out linking crtbegin and crtend, so all I had to really do was get rid of those and fill it out based on what we had available in /usr/lib once the system had been built. It seems to work and I plan on NOT touching it ever again.

The Ununited State of LibC

So once all the compiler errors were fixed, I had a host Zig built and ready to build programs with. So I wrote a hello world C program, pointed zig cc at it, and… linker errors!

Right. In order to build a C binary that links to SerenityOS’ libc, we need to know what symbols libc itself needs. As SerenityOS’ libc was (for some reason) statically linked, it can’t tell which libraries it needs on its own; therefore it was mostly trying it again and again until it stopped erroring out.

In the end, I had to link in pthreads, libdl, libm and libunwind for the C library to work.

Oh, and libc++ too.

Yeah… Serenity is fundamentally a C++ project and it shows. The C standard library itself uses C++’s runtime type information in order to link properly. It’s unfortunate, but it works, so it’s fine.

Oh, and I should mention this: As of just a couple weeks ago, this issue has been resolved thanks to the awesome work by Tim Schumacher, who merged in all those libraries into libc, which makes the serenity target actually need the least flags to link with libc now. Thanks Tim!

So It Built…

Okay, so the C binary built, and it looks like a valid C binary. Let’s put it on the disk image and run it and… welp.

A zig cc-built C binary crashing on SerenityOS.

A zig cc-built C binary crashing on SerenityOS.

This is an issue with SerenityOS. Basically, when you load in a program, your program first has to load in /usr/lib/Loader.so as its interpreter, which relocates your sections, sets up all your dynamic libraries and then executes you. The problem is the fact that static PIE binaries also need the relocation section, but they don’t want to use the loader (as Daniel put it, they’re “the edgy version [of static binaries]”).

Of course, what’s the best way to make Zig output a dynamic binary? Why, link a dynamic C++ library of course. After doing that, this beautiful line…

The
“wonderful” command used to build the binary.

The “wonderful” command used to build the binary.

generated this beautiful binary…

Running file on the binary.

Running file on the binary.

and here’s a Zig-built C binary running on SerenityOS!

Zig-built C binary running on SerenityOS.

Zig-built C binary running on SerenityOS.

Oh, and the proper fix for the dynamic binary thing is using the designated function for it in the Zig compiler, which I would learn and then promptly facepalm.

The Big Guns

Okay, so we’ve confirmed that Zig can indeed build a binary that can work on SerenityOS. But we haven’t actually built any Zig code, that was just C code that leveraged Clang. But what do we actually need in order to build Zig code?

Well, for things like printing to the screen or allocation to work, the built binary needs to know how to talk to the operating system. Zig uses two approaches for this. The first is to actually write how to syscall and its parameters directly within the compiler; this is the approach Zig has taken with Linux as it has a syscall stability guarantee. But Linux is the outlier here; for almost every other operating system, Zig will link in the C library and then use functions defined there as the stable system call interface. This is the best you can do, since many operating systems (including SerenityOS) only guarantee stability through libc. So when the time came to build Zig for the SerenityOS target, I would need to tell it the necessary things about the OS in order for the compilation to succeed.

The three components that are of concern when building Zig for a target are Zlib, LLVM and Zig itself. So it was time to start actually building big programs with zig cc.

Building zlib And LLVM

Building zlib was mostly uneventful, save for a little hack needed at the time which required us to force zlib to build with -fPIC (as some non-relocatable code still sneaked in somehow). This is fixed with recent Serenity Clang patches.

Now it was time for target LLVM… though it wasn’t as easy to make it get going. First off, it couldn’t find compress2 from zlib. Hmm, why could this be? We had some brainstorming on the Discord, and tried building with CMAKE_FIND_ROOT_PATH pointed to the out directory for the target, but it still didn’t work.

Or, well, it would’ve, if I hadn’t accidentally forgotten a backlash on the line above that flag. I love shells.

Epic fail.

Epic fail.

Moving on… Now cmake can’t figure out that we have dlopen and dladdr. This is the same “libc needs the rest of the world” issue that we discussed earlier, and was resolved by making Zig also link those in.

But beyond those minor setbacks, zlib and LLVM were rather easily built. Now it was time for the fun part!

The Holy Yak

Now we’re at the final stage before we can start trying out the Zig compiler on SerenityOS. For the most part, this is just a variant of the host Zig compiler; the target Zig compiler needs code from std.os in order to be built, which just means adding the things it tells me are missing.

But how do we actually add a new OS to the Zig standard library? Earlier I mentioned that you either implement the syscalls directly or you go through libc. Since SerenityOS’ stable (well, “stable”) syscall interface is libc, this means adding .serenity to this struct right here in std.c, and std will take care of it for me. Doing this actually gives me more info about what kind of things Zig expects from a std.c.xyz struct.

For an OS-specific struct in std.c, you’re mostly expected to just list all the various libc constants defined by POSIX in various structs, grouped by the constant prefix. Take PROT for example. Each of the values in the struct correspond to one constant defined by POSIX. One notable exception to this is errno numbers, which is actually an enum. Beyond this, you’re also expected to describe a couple of structs in pthreads, a few extern function declarations, and if you built with debug mode then you’re also expected to supply something that can walk through the list of shared objects - for Serenity this is dl_iterate_phdr like on Linux.

But there’s one special definition you must supply, and it’s quite special indeed. And it’s our good old friend errno! errno is a delightful legacy piece of POSIX. It’s basically a global value that gets written to whenever something unexpected happens as a result of a libc function call. The problem is, this obviously won’t work when you have more than one thread, so you need to somehow keep them separate. The solution most operating systems have come up with is making errno thread local, and then providing its location as a pointer via an __errno_location function. Most OS modules in std.c do this.

While errno was a thread local in SerenityOS, it was still a public symbol from libc and there was no __errno_location, which I worked around by creating one. However, this didn’t stop me from getting a bizarre size 0 symbol which crashes DynamicLoader, so I just removed that check and it works, which is fine I suppose. This has since been improved by making __errno_location the default way to get errno in SerenityOS.

After all that and implementing a directory iterator, we went through Semantic Analysis, and were at link point! But you know better than to believe it’s just gonna link first try, right?!

“Implementing” POSIX Support

So, of course, we had link errors… and this time, we were actually missing functions! Alright, let’s “implement” them by just slapping some stubs in libc. What could go wrong?

Well, after this and chasing a few more bizarre loader related errors, we finally got the Zig help printing!

zig help running on SerenityOS.

zig help running on SerenityOS.

Though, the Zig help being printed doesn’t actually prove that Zig can build stuff, so let’s build a hello world again!

Oops, Non-Working Semaphores

Hmm, okay, so it’s taking a while… it’s taking a good while actually. It probably shouldn’t take this long to start. Let’s profile it and see what it’s stuck on…

Zig
getting stuck on SerenityOS’ POSIX semaphore implementation.

Zig getting stuck on SerenityOS’ POSIX semaphore implementation.

Okay then, we’re stuck spinning on POSIX semaphores, I guess that’s broken. Let me make target Zig build in single-threaded mode per Andrew’s suggestion (Andrew was in the chat at this point), and it seems that we’re getting somewhere!

(Sidenote, building the Zig compiler single-threaded is currently broken in stage2.)

Actually Implementing POSIX Support

Or, well, we were gonna get somewhere, but here’s where my earlier decisions bite me. Turns out those functions I stubbed are actually needed by Zig… so let’s implement them in the kernel really quickly. While I was doing so, I hit this interesting error which is apparently a first.

SerenityOS getting the crown for being the first OS to throw TimerUnsupported.

SerenityOS getting the crown for being the first OS to throw TimerUnsupported.

Since SerenityOS didn’t have a way to get the clock resolution at the time, I wrote the best possible implementation.

The best clock_getres implementation.

The best clock_getres implementation.

Anyway, most of the work here was implementing a few more of those *at functions, so not much to say here.

Hmm… Weird Crash

Alright, now I’m getting this weird crash in init, which is stage1 C++ code… I wonder what’s up with that. And the fun part with this crash is that if you change the location or the arguments of any of the prints, the crash location changes!

At this point it was very late at the night to be debugging weird crashes, so I came back the next day and continued debugging. This was mostly a matter of adding debug prints in places and seeing what happened. This went on for a while until I hit a very peculiar error.

Debugging
with meme strings gone wrong.

Debugging with meme strings gone wrong.

Ignore the silly debug messages, at this point I was pretty bored. The interesting part is the fact that randomly, I started printing some unrelated strings! In fact, it was a very specific string called target_specific_features, defined near the start of the init function. A bit more debug prints, and it looked like we were getting that broken string right after stage2_version was being called.

Now, what’s so special about stage2_version? As I’ve mentioned earlier, the Zig compiler is still a hybrid of C++ and Zig. However, this doesn’t only mean that Zig code calls into C++; it also means that C++ code can call into Zig! To facilitate this, there are various function declarations with their definitions in C code, and the calling convention for those functions is set to C in order to have the ABI match. Now, what happens if that ABI doesn’t perfectly match? Unfortunately, you don’t get a nice compile error, you get weirdness like this.

In order to test what’s actually supposed to be generated, I wrote stage2_version in C++ and compiled it with a C++ compiler for Serenity. Here’s what that looks like:

00001430 <stage2_version>:
    1430:       55                      push   %ebp
    1431:       89 e5                   mov    %esp,%ebp
    1433:       8b 45 08                mov    0x8(%ebp),%eax
    1436:       c7 00 00 00 00 00       movl   $0x0,(%eax)
    143c:       c7 40 04 09 00 00 00    movl   $0x9,0x4(%eax)
    1443:       c7 40 08 00 00 00 00    movl   $0x0,0x8(%eax)
    144a:       5d                      pop    %ebp
    144b:       c2 04 00                ret    $0x4
    144e:       66 90                   xchg   %ax,%ax

And here’s the version that Zig builds. Notice any difference?

02e31180 <stage2_version>:
 2e31180:    31 c0                    xor    %eax,%eax
 2e31182:    ba 09 00 00 00           mov    $0x9,%edx
 2e31187:    31 c9                    xor    %ecx,%ecx
 2e31189:    c3                       ret

After posting this and with help from some C++ wizards on Discord, I learned that…

Thou Shalt Not Pass (Structs Through Registers)!

At least on i386.

The SystemV ABI for i386 makes this clear. If you’re going to return a struct, you have to use the address of the struct and place it on %eax, no way around it. Even if it fits into registers. This is apparently called an sret.

I didn’t know enough about Zig compiler to fix this properly, but my “fix” (which consisted of inserting an x86 ABI check into an x86_64 function in the compiler) seemed to work, and we’re now getting some compile errors on the Zig side!

Zig
compiler printing errors in SerenityOS.

Zig compiler printing errors in SerenityOS.

LibC Stuff

Now that the Zig compiler was working, the only things to fix were those reported by the compiler. I had to implement various things I skipped out on such as the faccessat syscall and debugging support. I also had to stub out flock because we didn’t support file lock downgrading at the time. Finally, I needed to tell Zig about where our libc was because the necessary checks were not implemented yet.

And with all of that, we have a Zig-compiled program running on SerenityOS.

Hello world in Zig running on SerenityOS!

Hello world in Zig running on SerenityOS!

(Though only with the compiler built in release mode, debug mode was causing an integer overflow panic but details, details.)

Just A Couple More Hacks…

Now that I was able to run a basic Zig program, I wanted to see if I could complete my original goal of running ZigSelf on SerenityOS too. I actually caught a few “used usize interchangeably with u64” errors while doing this since I was running on i386, but eventually that worked too.

ZigSelf running on SerenityOS.

ZigSelf running on SerenityOS.

Aftermath

After all of this, these changes unfortunately didn’t get upstreamed into Zig and there currently is no Zig port for SerenityOS. The main reason is that at the time, stage2 was “very close” (yeah, right), and also due to the various things I had to hack or patch out didn’t allow for it to be merged, and so I abandoned the port; however, I had always planned to complete it once stage2 was officially ready for use.

Now For Something Different…

So, in the passing months, thanks to the tremendous work by the Zig team and all the contributors’ help, stage2 became the default compiler. Where does that leave us? Well, I have something to show you.

(Here I had a demo showing a build of the self-hosted Zig compiler running on SerenityOS, ported again in October, building a simple hello world, then building ZigSelf, and finally (attempting to) build the Zig compiler itself.)

So this is actually the self-hosted Zig compiler running on SerenityOS. Again, thanks to the work of many SerenityOS and Zig contributors, this port was actually surprisingly easy to do once again and only took a couple of days (and most of the second day was chasing a nasty kernel bug). It’s surprisingly usable in conjunction with the C++ syntax highlighting you saw with TextEditor too. I believe that SerenityOS could be a great platform to work on Zig with enough work.

Future Work

So what’s next for SerenityOS on Zig? Obviously, I want to upstream this and make it as simple as running ./package.sh and waiting. Though there are some steps we must take to get there.

First things first, we need to upstream our LLVM patches from the SerenityOS side. As far as I’m aware, they’re mostly ready, but there are still some issues that need to be addressed. Without this, I cannot upstream the Zig patches, because as I mentioned near the start of my porting efforts, Zig’s LLVM integration is strongly tied to the target triples, and now that some of that code is in Zig too, I don’t think I can ifdef it out like before anymore.

Once that’s done, we need to find a solution to the problem of SerenityOS’ libc not being that stable. The names are the same, but things like errno numbers can change often. After doing some brainstorming in the Serenity Discord, we came up with the idea of a program which uses Zig’s @cImport superpowers to import all the constants and then provide them with a package to std.c.serenity, and then provide that package during the SerenityOS Zig build. So the… let’s say “less” changing parts of LibC could be part of Zig upstream, and the things that need to change more frequently can be provided with Serenity’s ports system.

After doing both of those, I want to explore how we could provide access to Serenity’s system libraries and IPC from Zig. As SerenityOS’ libraries are all in C++, I’m gonna have to write some bindings for this, but maybe an alternative is possible by parsing the IPC files instead.

Closing Thoughts

I want to finish up with some thoughts. Obviously, the story that I’ve told you is nowhere near the scale of something like Zig and SerenityOS, but I believe that it captures at least a bit of that spirit. At the start, the task seemed insurmountable to me - I mean, how would I even go about getting Zig running on SerenityOS? “It’s gonna take weeks”, I thought. But it turns out, the reason it wasn’t done was just because nobody has tried it. So if there’s a lesson to learn from this story, it’s that any task is achievable (be it an operating system, a new programming language, a browser from the scratch) with enough yakshaving!

Thanks

I’d like to thank all of the people on the slide for their help along the journey, and specially Daniel Bertalan, Ali Mohammad and Andrew Kaster for their emotional support during the final hours, Andreas Kling for creating the amazing SerenityOS project, and Andrew Kelley for creating the wonderful Zig programming language. Thanks everyone.