Technologie
For the past two years Charles Zhang and I have been working on getting my game engine, Trial, running on the Nintendo Switch. The primary challenge in doing this is porting the underlying Common Lisp runtime to work on this platform. We knew going into this that it was going to be hard, but it has proven to be quite a bit more tricky than expected. I’d like to outline some of the challenges of the platform here for posterity, though please also understand that due to Nintendo’s NDA I can’t go into too much detail.
Current Status
I want to start off with where we are at, at the time of writing this article. We managed to port the runtime and compiler to the point where we can compile and execute arbitrary lisp code directly on the Switch. We can also interface with shared libraries, and I’ve ported a variety of operating system portability libraries that Trial needs to work on the Switch as well.
The above photo shows Trial’s REPL example running on the Switch devkit. Trial is setting up the OpenGL context, managing input, allocating shaders, all that good stuff, to get the text shown on screen; the Switch does not offer a terminal of its own.
Unfortunately it also crashes shortly after as SBCL is trying to engage its garbage collector. The Switch has some unique constraints in that regard that we haven’t managed to work around quite yet. We also can’t output any audio yet, since the C callback mechanism is also broken. And of course, there’s potentially a lot of other issues yet to rear their head, especially with regards to performance.
Whatever the case, we’ve gotten pretty far! This work hasn’t been free, however. While I’m fine not paying myself a fair salary, I can’t in good conscience have Charles invest so much of his valuable time into this for nothing. So I’ve been paying him on a monthly basis for all the work he’s been doing on this port. Up until now that has cost me ~17’000 USD. As you may or may not know, I’m self-employed. All of my income stems from sales of Kandria and donations from generous supporters on Patreon, GitHub, and Ko-Fi. On a good month this totals about 1’200 USD. On a bad month this totals to about 600 USD. That would be hard to get by in a cheap country, and it’s practically impossible in Zürich, Switzerland.
I manage to get by by living with my parents and being relatively frugal with my own personal expenses. Everything I actually earn and more goes back into hiring people like Charles to do cool stuff. Now, I’m ostensibly a game developer by trade, and I am working on a currently unannounced project. Games are very expensive to produce, and I do not have enough reserves to bankroll it anymore. As such, it has become very difficult to decide what to spend my limited resources on, and especially a project like this is much more likely to be axed given that I doubt Kandria sales on the Switch would even recoup the porting costs.
To get to the point: if you think this is a cool project and you would like to help us make the last few hurdles for it to be completed, please consider supporting me on Patreon, GitHub, or Ko-Fi. On Patreon you get news for every new library I release (usually at least one a month) and an exclusive monthly roundup of the current development progress of the unannounced game. Thanks!
An Overview
First, here’s what’s publicly known about the Switch’s environment: user code runs on an ARM64 Cortex-A57 chip with four cores and 4 GB RAM, and on top of a proprietary microkernel operating system that was initially developed for the Nintendo 3Ds.
SBCL already has an ARM64 Linux port, so the code generation side is already solved. Kandria also easily fits into 4GB RAM, so there’s no issues there either. The difficulties in the port reside entirely in interfacing with the surrounding proprietary operating system of the switch. The system has some constraints that usual PC operating systems do not have, which are especially problematic for something like Lisp as you’ll see in the next section.
Fortunately for us, and this is the reason I even considered a port in the first place, the Switch is also the only console to support the OpenGL graphics library for rendering, which Trial is based upon. Porting Trial itself to another graphics library would be a gigantic effort that I don’t intend on undertaking any time soon. The Xbox only supports DirectX, though supposedly there’s an OpenGL -> DirectX layer that Microsoft developed, so that might be possible. The Playstation on the other hand apparently still sports a completely proprietary graphics API, so I don’t even want to think about porting to that platform.
Anyway, in order to get started developing I had to first get access. I was lucky enough that Nintendo of Europe is fairly accommodating to indies and did grant my request. I then had to buy a devkit, which costs somewhere around 400 USD. The devkit and its SDK only run on Windows, which isn’t surprising, but will also be a relevant headache later.
Before we can get on to the difficulties in building SBCL for the Switch, let’s first take a look at how SBCL is normally built on a PC.
Building SBCL
SBCL is primarily written in Lisp itself. There is a small C runtime as well, which you use a usual C compiler to compile, but before it can do that, there’s some things it needs to know about the operating system environment it compiles for. The runtime also doesn’t have a compiler of its own, so it can’t compile any Lisp code. In order to get the whole process kicked off, SBCL requires another Lisp implementation to bootstrap with, ideally another version of itself.
The build then proceeds in roughly five phases:
build-config
This step just gathers whatever build configuration options you want for your target and spits them out into a readable format for the rest of the build process.
make-host-1
Now we build the cross-compiler with the host Lisp compiler, and at the same time emit C header files describing Lisp object layouts in memory as C structs for the next step.
make-target-1
Next we run the target C compiler to create the C runtime. As mentioned, this uses a standard C compiler, which can itself be a cross-compiler. The C runtime includes the garbage collector and other glue to the operating system environment. This step also produces some constants the target Lisp compiler and runtime needs to know about by using the C compiler to read out relevant operating system headers.
make-host-2
With the target runtime built, we build the target Lisp system (compiler and the standard library) using the Lisp cross-compiler built by the Lisp host compiler in
make-host-1
. This step produces a « cold core » that the runtime can jump into, and can be done purely on the host machine. This cold core is not complete, and needs to be executed on the target machine with the target runtime to finish bootstrapping, notably to initialize the object system, which requires runtime compilation. This is done in
make-target-2
The cold core produced in the last step is loaded into the target runtime, and finishes the bootstrapping procedure to compile and load the rest of the Lisp system. After the Lisp system is loaded into memory, the memory is dumped out into a « warm core », which can be loaded back into memory in a new process with the target runtime. From this point on, you can load new code and dump new images at will.
Notable here is the need to run Lisp code on the target machine itself. We can’t cross-compile « purely » on the host, not in the least because user Lisp code cannot be compiled without also being run like batch-compiled C code can, and when it is run it assumes that it is in the target environment. So we really don’t have much of a choice in the matter.
In order to deploy an application, we proceed similar to
make-target-2
: We compile in Lisp code incrementally and then when we have everything we need we dump out a core with the runtime attached to it. This results in a single binary with a data blob attached.When the SBCL runtime starts up it looks for a core blob, maps it into memory, marks pages with code in them as executable, and then jumps to the entry function the user designated. This all is a problem for the Switch.
Building for the Switch
The Switch is not a PC environment. It doesn’t have a shell, command line, or compiler suite on it to run the build as we usually do. Worse still, its operating system does not allow you to create executable pages, so even if we could run the compilation steps on there we couldn’t incrementally compile anything on it like we usually do for Lisp code.
But all is not lost. Most of the code is not platform dependent and can simply be compiled for ARM64 as usual. All we need to do is make sure that anything that touches the surrounding environment in some way knows that we’re actually trying to compile for the Switch, then we can use another ARM64 environment like Linux to create our implementation.
With that in mind, here’s what our steps look like:
build-config
We run this on some host system, using a special flag to indicate that we’re building for the Switch. We also enable thefasteval
contrib. We needfasteval
to step in for any place where we would usually invoke the compiler at runtime, since we absolutely cannot do that on the Switch.
make-host-1
This step doesn’t change. We just get different headers that prep for the Switch platform.
make-target-1
Now we use the C compiler the Nintendo SDK provides for us, which can cross-compile for the Switch. Unfortunately the OS is not POSIX compliant, so we had to create a custom runtime target in SBCL that stubs out and papers over the operating system environment differences that we care about, like dynamic linking, mapping pages, and so on.
Here is where things get a bit weird. We are now moving on to compiling Lisp code, and we want to do so on a Linux host system. So we have to…
build-config
(2)We now create a normal ARM64 Linux system with the same feature set as for the Switch. This involves the usual steps as before, though with a special flag to inform some parts of the Lisp process that we’re going to ultimately target the Switch.
make-host-1
(2)
make-target-1
(2)
make-host-2
make-target-2
With all of this done we now have a slightly special SBCL build for Linux ARM64. We can now move on to compiling user code.
For user code we now perform some tricks to make it think it’s running on the Switch, rather than on Linux. In particular we modify
*features*
to include:nx
(the Switch code name) and not:linux
,:unix
, or:posix
. Once that is set up and ASDF has been neutered, we can compile our program (like Trial) « as usual » and at the end dump out a new core.We’ve solved the problem of actually compiling the code, but we still need to figure out how to get the code started on the Switch, since it does not allow us to do the usual core-mapping strategy. As such, attaching the new core to the runtime we made for the Switch won’t work.
To make this work, we make use of two relatively unknown features of SBCL: immobile-code, and elfination. Usually when SBCL compiles code at runtime, it sticks it into a page somewhere, and marks that page executable. The code itself however could become unneeded at some point, at which point we’d like to garbage collect it. We can then reclaim the space it took up, and to do so compact the rest of the code around it. The immobile-code feature allows SBCL to take up a different strategy, where code is put into special reserved code pages and remains there. This means it can’t be garbage collected, but it instead can take advantage of more traditional operating system support. Typically executables have pre-marked sections that the operating system knows to contain code, so it can take care of the mapping when the program is started, rather than the program doing it on its own like SBCL usually does.
OK, so we can generate code and prevent it from being moved. But we still have a core at the end of our build that we now need to transform into the separate code and data sections needed for a typical executable. This is done with the elfination step.
The elfinator looks at a core and performs assembly rewriting to make the code position-independent (a requirement for Address Space Layout Randomisation), and then tears it out into two separate files, a pure code assembly file, and a pure data payload file.
We can now take those two files and link them together with the runtime that the C compiler produced and get a completed SBCL that runs like any other executable would. So here’s the last steps of the build process:
Run the elfinator to generate the assembly files
Link the final binary
Run the Nintendo SDK’s authoring tools to bundle metadata, shared libraries, assets, and the application binary into one final package
That’s quite an involved build setup. Not to mention that we need at least an ARM64 Linux machine to run most of the build on, as well as either an AMD64 Windows machine (or an AMD64 Linux machine with Wine) to run the Nintendo SDK compiler and authoring tools.
I usually use an AMD64 Linux machine, so there’s a total of three machines involved: The AMD64 « driver, » the ARM64 build host, and a Windows VM to talk to the devkit with.
I wrote a special build system with all sorts of messed up caching and cross-machine synchronisation logic to automate all of this, which was quite a bit of work to get going, especially since the build should also be drivable from an MSYS2/Windows setup. Lots of fun with path mangling!
So now we have a full Lisp system, including user code, compiling for and being able to run on the Switch. Wow! I’ve skipped over a lot of the nitty-gritty dealing with getting the build properly aware of which target it’s building for, making the elfinator and immobile-code working on ARM64, and porting all of the support libraries like pathname-utils, libmixed, cl-gamepad, etc. Again, most of the details we can’t openly talk about due to the NDA. However, we have upstreamed what work we could, and all of the Lisp libraries don’t have a private fork.
It’s worth noting though that elfination wasn’t initially designed to produce position independent executable Lisp code, which is usually full of absolute pointers. So we needed to do a lot of work in the SBCL compiler and runtime to support load time relocation of absolute pointers and make sure code objects (which usually contain code constants) no longer have absolute pointers, as the GC can’t modify executable sections. Not even the OS loader is allowed to modify executable sections to relocate absolute pointer. We did this by relocating absolute pointers like code constants outside of the text space into a read-writable space close enough to rewrite constant references in code to load from this r/w space instead, which the loader and the moving GC can fixup pointers at.
Instead of interfacing directly with the Nintendo SDK, I’ve opted to create my own C libraries that have a custom interface the Lisp libraries interface with in order to access the operating system functionality it needs. That way I can at least publish the Lisp bits openly, and only keep the small C library private. Anyway, now that we can run stuff we’re
Le Collecteur de Déchets
La gestion des déchets en informatique est un sujet vaste, avec de nombreuses techniques pour optimiser son efficacité. Le collecteur de déchets standard pour SBCL est connu sous le nom de « gencgc », un collecteur de déchets générationnel. Ce terme fait référence à la séparation des objets en différentes « générations », qui sont analysées à des fréquences variées, permettant de compacter l’espace en déplaçant les objets d’une génération à une autre. Cela ne pose pas de problème pour la Switch, sauf lorsqu’il s’agit de multithreading.
Lorsque plusieurs threads sont en jeu, il devient impossible de déplacer des objets, car un autre thread pourrait y accéder à tout moment. La solution la plus simple consiste à suspendre tous les threads avant de procéder à la collecte des déchets. La question se pose alors : comment un thread peut-il inciter les autres à se suspendre avant de commencer la collecte ?
Sur les systèmes Unix, une astuce pratique est utilisée : un mécanisme de signalisation permet d’envoyer un signal aux autres threads, qui prennent alors cela comme un indice pour se suspendre.
Cependant, sur la Switch, il n’existe pas de mécanisme de signal. En fait, il est impossible d’interrompre les threads. Nous devons donc trouver un moyen pour chaque thread de comprendre qu’il doit se suspendre de lui-même. La stratégie typique pour cela est appelée « safepoints ».
En gros, nous modifions légèrement le compilateur pour y insérer du code supplémentaire qui vérifie si le thread doit se suspendre ou non. Cette stratégie présente plusieurs défis :
Ajouter une vérification a un coût. Nous devons donc limiter le nombre de vérifications.
Si nous ne vérifions pas assez souvent, nous risquons de bloquer tous les autres threads, car la collecte des déchets ne peut commencer tant qu’ils ne sont pas tous suspendus.
Si nous devons insérer trop d’instructions pour une vérification, cela perturbera les lignes de cache du CPU et les optimisations de pipeline.
Le système de safepoint actuel dans SBCL a été conçu pour Windows, qui, comme la Switch, ne dispose pas de gestionnaires de signaux inter-processus. Cependant, contrairement à la Switch, il a toujours la gestion des signaux pour le thread en cours. Ainsi, l’implémentation actuelle des safepoints a été conçue de la manière suivante :
Chaque thread conserve une page sur laquelle un safepoint écrit un mot. Lorsque la collecte des déchets est engagée, ces pages sont marquées comme en lecture seule, de sorte que lorsque le safepoint est atteint et qu’un autre thread tente d’écrire sur la page, une faute de segmentation se produit, permettant au thread de se suspendre. Cette méthode est efficace, car elle nécessite seulement une instruction pour écrire dans la page.
Sur la Switch, nous ne pouvons pas non plus utiliser cette astuce, ce qui nous oblige à insérer une vérification plus complexe, ce qui peut être délicat à mettre en œuvre, comme c’est souvent le cas avec les algorithmes parallèles.
Étant donné que les safepoints ne sont nécessaires que sur Windows, ils n’ont pas été testés sur d’autres plateformes, ce qui rend leur stabilité incertaine. Il semble que cela soit un véritable casse-tête dans le code, et idéalement, il faudrait tout recommencer, mais espérons que nous n’en arriverons pas là.
Je tiens également à souligner le problème que pose CLOS. En général, SBCL retarde la compilation de la « fonction discriminante » nécessaire pour dispatcher vers les méthodes jusqu’à la première invocation de la fonction générique. Cela est dû à la nature dynamique de CLOS, qui permet d’ajouter et de supprimer des méthodes à tout moment, rendant difficile la détermination d’un moment opportun pour considérer le système comme complet. Évidemment, sur la Switch, nous ne pouvons pas invoquer le compilateur, donc nous ne pouvons pas vraiment procéder ainsi. Pour l’instant, notre stratégie consiste à nous appuyer sur l’évaluateur rapide. Nous remplaçons la fonction
compile
par une lambda qui exécute le code via l’évaluateur. Cela fonctionne avec tout code utilisateur qui dépend decompile
, bien que cela soit évidemment beaucoup plus lent que si nous pouvions réellement compiler.Ce qui nous amène à
Travaux Futurs
L’astuce de l’évaluateur rapide est principalement une solution de secours. Idéalement, j’aimerais explorer des options pour figer autant que possible CLOS juste avant que l’image finale ne soit générée et compiler autant que possible à l’avance. Je souhaite également examiner de plus près le mode de compilation par blocs que Charles a restauré il y a quelques années.
Il est très probable que le processeur peu puissant de la Switch nous oblige à mettre en œuvre d’autres optimisations, notamment du côté de mon moteur et du code de Kandria lui-même. Jusqu’à présent, j’ai pu me contenter d’une optimisation relativement faible, car même les ordinateurs d’il y a dix ans sont largement suffisants pour exécuter ce dont j’ai besoin pour le jeu. Cependant, je ne suis pas certain que la Switch puisse rivaliser avec cela, surtout en raison des contraintes supplémentaires sur les performances dues à l’absence de support du système d’exploitation.
Tout d’abord, nous devons faire fonctionner complètement le collecteur de déchets. Il fonctionne suffisamment pour démarrer et entrer dans la boucle principale de Trial, mais dès qu’il atteint la compaction multi-générationnelle, il échoue.
Ensuite, nous devons rétablir les rappels depuis C. Apparemment, c’est une partie du code SBCL qui ne peut être décrite que comme « un désordre », impliquant de nombreuses routines d’assemblage faites maison, qui nécessitent probablement quelques ajustements pour fonctionner correctement avec le code immobile et elfination. Heureusement, les rappels sont relativement rares, Trial n’en a besoin que pour la lecture audio via libmixed.
Il y a également d’autres problèmes que nous gardons à l’esprit, mais qui ne nécessitent pas notre attention immédiate, ainsi que des fonctionnalités de portabilité supplémentaires sur lesquelles je sais que je devrai travailler dans Trial avant que sa suite de tests ne passe entièrement sur la Switch.
Conclusion
Je m’assurerai d’ajouter un addendum ici si l’état du port change de manière significative à l’avenir. Certaines personnes m’ont également demandé si le travail pouvait être rendu public ou si j’étais disposé à le partager.
La réponse est que, bien que j’aimerais désespérément tout partager publiquement, l’accord de non-divulgation (NDA) m’en empêche. Nous continuons à publier et à rendre public tout ce que nous pouvons, mais certaines parties qui sont directement liées au SDK de Nintendo ne peuvent être partagées avec quiconque n’ayant pas également signé le NDA. Donc, dans le cas très improbable que quelqu’un d’autre que moi soit assez fou pour vouloir publier un jeu en Common Lisp sur la Nintendo Switch, il peut me contacter et je lui donnerai volontiers accès à notre travail de portage une fois le NDA signé.
Naturellement, je tiendrai également les gens informés plus étroitement sur l’évolution des choses dans les mises à jour mensuelles pour les contributeurs. Cela dit, je vous demande encore une fois de considérer me soutenir sur Patreon, GitHub ou Ko-Fi. Tous les revenus de ces plateformes iront, pour un avenir prévisible, au financement du portage de SBCL sur la Switch ainsi qu’au projet de jeu actuel.
Merci encore pour votre lecture, et j’espère pouvoir partager bientôt des nouvelles passionnantes avec vous !