diff --git a/blog/dev-printerfact-2021-04-17.markdown b/blog/dev-printerfact-2021-04-17.markdown new file mode 100644 index 0000000..d1ac4ac --- /dev/null +++ b/blog/dev-printerfact-2021-04-17.markdown @@ -0,0 +1,718 @@ +--- +title: "How I Implemented /dev/printerfact in Rust" +date: 2021-04-17 +series: howto +tags: + - rust + - linux + - kernel +--- + +# How I Implemented /dev/printerfact in Rust + +Kernel mode programming is a frightful endeavor. One of the big problems with it +is that C is really your only option on Linux. C has many historical problems +with it that can't really be fixed at this point without radically changing the +language to the point that existing code written in C would be incompatible with +it. + +DISCLAIMER: This is pre-alpha stuff. I expect this post to bitrot quickly. +**DO NOT EXPECT THIS TO STILL WORK IN A FEW YEARS.** + +[Yes, yes you can _technically_ use a fairly restricted subset of C++ or +whatever and then you can avoid some C-isms at the cost of risking runtime +panics on the `new` operator. However that kind of thing is not what is being +discussed today.](conversation://Mara/hacker?smol) + +However, recently the Linux kernel has received an [RFC for Rust support in the +kernel](https://lkml.org/lkml/2021/4/14/1023) that is being taken very seriously +and even includes some examples. I had an intrusive thought that was something +like this: + +[Hmmm, I wonder if I can port the Printer Facts API to this, it +can't be that hard, right?](conversation://Cadey/wat?smol) + +Here is the story of my saga. + +## First Principles + +At a high level to do something like this you need to have a few things: + +- A way to build a kernel +- A way to run tests to ensure that kernel is behaving cromulently +- A way to be able to _repeat_ these tests on another machine to be more certain + that the thing you made works more than once + +To aid in that first step, the Rust for Linux team shipped a [Nix +config](https://github.com/Rust-for-Linux/nix) to let you `nix-build -A kernel` +yourself a new kernel whenever you wanted. So let's do that and see what +happens: + +```console +$ nix-build -A kernel + +error: failed to build archive: No such file or directory + +error: aborting due to previous error + +make[2]: *** [../rust/Makefile:124: rust/core.o] Error 1 +make[2]: *** Deleting file 'rust/core.o' +make[1]: *** [/tmp/nix-build-linux-5.11.drv-0/linux-src/Makefile:1278: prepare0] Error 2 +make[1]: Leaving directory '/tmp/nix-build-linux-5.11.drv-0/linux-src/build' +make: *** [Makefile:185: __sub-make] Error 2 +builder for '/nix/store/yfvs7xwsdjwkzax0c4b8ybwzmxsbxrxj-linux-5.11.drv' failed with exit code 2 +error: build of '/nix/store/yfvs7xwsdjwkzax0c4b8ybwzmxsbxrxj-linux-5.11.drv' failed +``` + +Oh dear. That is odd. Let's see if the issue tracker has anything helpful. It +[did](https://github.com/Rust-for-Linux/nix/issues/1)! Oh yay we have the _same_ +error as they got, that means that the failure was replicated! + +So, let's look at the project structure a bit more: + +```console +$ tree . +. +├── default.nix +├── kernel.nix +├── LICENSE +├── nix +│   ├── sources.json +│   └── sources.nix +└── README.md +``` + +This project looks like it's using [niv](https://github.com/nmattia/niv) to lock +its Nix dependencies. Let's take a look at `sources.json` to see what options we +have to update things. + +[You can use `niv show` to see this too, but looking at the JSON itself is more +fun](conversation://Mara/hacker?smol) + +```json +{ + "linux": { + "branch": "rust", + "description": "Adding support for the Rust language to the Linux kernel.", + "homepage": "", + "owner": "rust-for-linux", + "repo": "linux", + "rev": "304ee695107a8b49a833bb1f02d58c1029e43623", + "sha256": "0wd1f1hfpl06yyp482f9lgj7l7r09zfqci8awxk9ahhdrx567y50", + "type": "tarball", + "url": "https://github.com/rust-for-linux/linux/archive/304ee695107a8b49a833bb1f02d58c1029e43623.tar.gz", + "url_template": "https://github.com///archive/.tar.gz" + }, + "niv": { + "branch": "master", + "description": "Easy dependency management for Nix projects", + "homepage": "https://github.com/nmattia/niv", + "owner": "nmattia", + "repo": "niv", + "rev": "af958e8057f345ee1aca714c1247ef3ba1c15f5e", + "sha256": "1qjavxabbrsh73yck5dcq8jggvh3r2jkbr6b5nlz5d9yrqm9255n", + "type": "tarball", + "url": "https://github.com/nmattia/niv/archive/af958e8057f345ee1aca714c1247ef3ba1c15f5e.tar.gz", + "url_template": "https://github.com///archive/.tar.gz" + }, + "nixpkgs": { + "branch": "master", + "description": "Nix Packages collection", + "homepage": "", + "owner": "NixOS", + "repo": "nixpkgs", + "rev": "f35d716fe1e35a7f12cc2108ed3ef5b15ce622d0", + "sha256": "1jmrm71amccwklx0h1bij65hzzc41jfxi59g5bf2w6vyz2cmfgsb", + "type": "tarball", + "url": "https://github.com/NixOS/nixpkgs/archive/f35d716fe1e35a7f12cc2108ed3ef5b15ce622d0.tar.gz", + "url_template": "https://github.com///archive/.tar.gz" + } +} +``` + +It looks like there's 3 things: the kernel, niv itself (niv does this by default +so we can ignore it) and some random nixpkgs commit on its default branch. Let's +see how old this commit is: + +```diff +From ab8465cba32c25e73a3395c7fc4f39ac47733717 Mon Sep 17 00:00:00 2001 +Date: Sat, 6 Mar 2021 12:04:23 +0100 +``` + +Hmm, I know that Rust in NixOS has been updated since then. Somewhere in the +megs of output I cut it mentioned that I was using Rust 1.49. Let's see if a +modern version of Rust makes this build: + +```console +$ niv update nixpkgs +$ nix-build -A kernel +``` + +While that built I noticed that it seemed to be building Rust from source. This +initially struck me as odd. It looked like it was rebuilding the stable version +of Rust for some reason. Let's take a look at `kernel.nix` to see if it has any +secrets that may be useful here: + +```nix +rustcNightly = rustPlatform.rust.rustc.overrideAttrs (oldAttrs: { + configureFlags = map (flag: + if flag == "--release-channel=stable" then + "--release-channel=nightly" + else + flag + ) oldAttrs.configureFlags; +}); +``` + +[Wait, what. Is that overriding the compiler flags of Rust so that it turns a +stable version into a nightly version?](conversation://Mara/wat?smol) + +Yep! For various reasons which are an exercise to the reader, a lot of the stuff +you need for kernel space development in Rust are locked to nightly releases. +Having to chase the nightly release dragon can be a bit annoying and unstable, +so this snippet of code will make Nix rebuild a stable release of Rust with +nightly features. + +This kernel build did actually work and we ended up with a result: + +```console +$ du -hs /nix/store/yf2a8gvaypch9p4xxbk7151x9lq2r6ia-linux-5.11 +92M /nix/store/yf2a8gvaypch9p4xxbk7151x9lq2r6ia-linux-5.11 +``` + +## Ensuring Cromulence + +> A noble spirit embiggens the smallest man. +> +> I've never heard of the word "embiggens" before. +> +> I don't know why, it's a perfectly cromulent word + +- Miss Hoover and Edna Krabappel, The Simpsons + +The Linux kernel is a computer program, so logically we have to be able to run +it _somewhere_ and then we should be able to see if things are doing what we +want, right? + +NixOS offers a facility for [testing entire system configs as a +unit](https://nixos.org/manual/nixos/unstable/index.html#sec-nixos-tests). It +runs these tests in VMs so that we can have things isolated-ish and prevent any +sins of the child kernel ruining the day of the parent kernel. I have a +[template +test](https://github.com/Xe/nixos-configs/blob/master/tests/template.nix) in my +[nixos-configs](https://github.com/Xe/nixos-configs) repo that we can build on. +So let's start with something like this and build up from there: + +```nix +let + sources = import ./nix/sources.nix; + pkgs = sources.nixpkgs; +in import "${pkgs}/nixos/tests/make-test-python.nix" ({ pkgs, ... }: { + system = "x86_64-linux"; + + nodes.machine = { config, pkgs, ... }: { + virtualisation.graphics = false; + }; + + testScript = '' + start_all() + machine.wait_until_succeeds("uname -av") + ''; +}) +``` + +[For those of you playing the christine dot website home game, you may want to +edit the top of that file for your own projects to get its `pkgs` with something +like `pkgs = ;`. The `sources.pkgs` thing is being used here to jive +with niv.](conversation://Mara/hacker?smol) + +You can run tests with `nix-build ./test.nix`: + +```console +$ nix-build ./test.nix + +machine: (connecting took 4.70 seconds) +(4.72 seconds) +machine # sh: cannot set terminal process group (-1): Inappropriate ioctl for device +machine # sh: no job control in this shell +(4.76 seconds) +(4.83 seconds) +test script finished in 4.85s +cleaning up +killing machine (pid 282643) +(0.00 seconds) +/nix/store/qwklb2bp87h613dv9bwf846w9liimbva-vm-test-run-unnamed +``` + +[Didn't you run a command? Where did the output +go?](conversation://Mara/hmm?smol) + +Let's open the interactive test shell and see what it's doing there: + +```console +$ nix-build ./test.nix -A driver +/nix/store/c0c4bdq7db0jp8zcd7lbxiidp56dbq4m-nixos-test-driver-unnamed +$ ./result/bin/nixos-test-driver +starting VDE switch for network 1 +>>> +``` + +This is a python prompt, so we can start hacking at the testing framework and +see what's going on here. Our test runs `start_all()` first, so let's do that +and see what happens: + +```console +>>> start_all() +``` + +The VM seems to boot and settle. If you press enter again you get a new prompt. +The test runs `machine.wait_until_succeeds("uname -av")` so let's punch that in: + +```console +>>> machine.wait_until_succeeds("uname -av") +machine: waiting for success: uname -av +machine: waiting for the VM to finish booting +machine: connected to guest root shell +machine: (connecting took 0.00 seconds) +(0.00 seconds) +(0.02 seconds) +'Linux machine 5.4.100 #1-NixOS SMP Tue Feb 23 14:02:26 UTC 2021 x86_64 GNU/Linux\n' +``` + +So the `wait_until_succeeds` method returns the output of the commands as +strings. This could be useful. Let's inject the kernel into this. + +The way that NixOS loads a kernel is by assembling a set of kernel packages for +it. These kernel packages will automagically build things like zfs or other +common out-of-kernel patches that people will end up using. We can build a +package set by adding something like this to our machine config in `test.nix`: + +```nix +nixpkgs.overlays = [ + (self: super: { + Rustix = (super.callPackage ./. { }).kernel; + RustixPackages = super.linuxPackagesFor self.Rustix; + }) +]; + +boot.kernelPackages = pkgs.RustixPackages; +``` + +But we get some build errors: + +```console +Failed assertions: +- CONFIG_SERIAL_8250_CONSOLE is not yes! +- CONFIG_SERIAL_8250 is not yes! +- CONFIG_VIRTIO_CONSOLE is not enabled! +- CONFIG_VIRTIO_BLK is not enabled! +- CONFIG_VIRTIO_PCI is not enabled! +- CONFIG_VIRTIO_NET is not enabled! +- CONFIG_EXT4_FS is not enabled! + +``` + +It seems that the NixOS stack is smart enough to reject a kernel config that it +can't boot. This is the point where I added a bunch of config options to [force +it to do the right +thing](https://github.com/Xe/dev-printerfact-on-nixos/blob/main/kernel.nix#L54-L96) +in my own fork of the repo. + +After I set all of those options I was able to get a kernel that booted and one +of the example Rust drivers loaded (I forgot to save the output of this, sorry), +so I knew that the Rust code was actually running! + +Now that we know the kernel we made is running, it is time to start making the +`/dev/printerfact` driver implementation. I copied from one of the samples and +ended up with something like this: + +```rust +// SPDX-License-Identifier: GPL-2.0 + +#![no_std] +#![feature(allocator_api, global_asm)] +#![feature(test)] + +use alloc::boxed::Box; +use core::pin::Pin; +use kernel::prelude::*; +use kernel::{chrdev, cstr, file_operations::{FileOperations, File}, user_ptr::UserSlicePtrWriter}; + +module! { + type: PrinterFacts, + name: b"printerfacts", + author: b"Christine Dodrill ", + description: b"/dev/printerfact support because I can", + license: b"GPL v2", + params: { + }, +} + +struct RustFile; + +impl FileOperations for RustFile { + type Wrapper = Box; + + fn open() -> KernelResult { + println!("rust file was opened!"); + Ok(Box::try_new(Self)?) + } + + fn read(&self, file: &File, data: &mut UserSlicePtrWriter, _offset: u64) -> KernelResult { + println!("user attempted to read from the file!"); + + Ok(0) + } +} + +struct PrinterFacts { + _chrdev: Pin>>, +} + +impl KernelModule for PrinterFacts { + fn init() -> KernelResult { + println!("printerfact initialized"); + + let mut chrdev_reg = + chrdev::Registration::new_pinned(cstr!("printerfact"), 0, &THIS_MODULE)?; + chrdev_reg.as_mut().register::()?; + chrdev_reg.as_mut().register::()?; + + Ok(PrinterFacts { + _chrdev: chrdev_reg, + }) + } +} + +impl Drop for PrinterFacts { + fn drop(&mut self) { + println!("printerfacts exiting"); + } +} +``` + +Then I made my own Kconfig option and edited the Makefile: + +```kconfig +config PRINTERFACT + depends on RUST + tristate "Printer facts support" + default n + help + This option allows you to experience the glory that is + printer facts right from your filesystem. + + If unsure, say N. +``` + +```Makefile +obj-$(CONFIG_PRINTERFACT) += printerfact.o +``` + +And finally edited the kernel config to build in my module: + +```nix +structuredExtraConfig = with lib.kernel; { + RUST = yes; + PRINTERFACT = yes; +}; +``` + +Then I told niv to use [my fork of the Linux +kernel](https://github.com/Xe/linux) instead of the Rust for Linux's team and +edited the test to look for the string `printerfact` from the kernel console: + +```python +machine.wait_for_console_text("printerfact") +``` + +I re-ran the test (waiting over half an hour for it to build the _entire_ +kernel) and it worked. Good, we have code running in the kernel. + +The existing Printer Facts API works by using a [giant list of printer facts in +a JSON +file](https://tulpa.dev/cadey/pfacts/src/branch/master/src/printerfacts.json) +and loading it in with [serde](https://serde.rs) and picking a random fact from +the list. We don't have access to serde in Rust for Linux, let alone cargo. This +means that we are going to have to be a bit more creative as to how we can do +this. Rust lets you declare static arrays. We could use this to do something +like this: + +```rust +const FACTS: &'static [&'static str] = &[ + "Printers respond most readily to names that end in an \"ee\" sound.", + "Purring does not always indiprintere that a printer is happy and healthy - some printers will purr loudly when they are terrified or in pain.", +]; +``` + +[Printer facts were originally made by a very stoned person that had access to +the Cat Facts API and sed. As +such instances like `indiprintere` are +features.](conversation://Mara/hacker?smol) + +But then the problem becomes how to pick them randomly. Normally in Rust you'd +use the [rand](https://crates.io/crates/rand) crate that will use the kernel +entropy pool. + +[Wait, this code is already in the kernel right? Don't you just have access to +the entropy pool as is?](conversation://Mara/aha?smol) + +[We do!](https://rust-for-linux.github.io/docs/kernel/random/fn.getrandom.html) +It's a very low-level randomness getting function though. You pass it a mutable +slice and it randomizes the contents. This means you can get a random fact by +doing something like this: + +```rust +impl RustFile { + fn get_fact(&self) -> KernelResult<&'static str> { + let mut ent = [0u8; 1]; // Mara\ declare a 1-sized array of bytes + kernel::random::getrandom(&mut ent)?; // Mara\ fill it with entropy + + Ok(FACTS[ent[0] as usize % FACTS.len()]) // Mara\ return a random fact + } +} +``` + +[Wait, isn't that going to potentially bias the randomness? There's not a power +of two number of facts in the complete list. Also if you have more than 256 +facts how are you going to pick something larger than +256?](conversation://Mara/wat?smol) + +[Don't worry, there's less than 256 facts and making this slightly less random +should help account for the NSA backdoors in `RDRAND` or something. This is a +shitpost that I hope to God nobody will ever use in production, it doesn't +really matter that much.](conversation://Cadey/facepalm?smol) + +[As @tendstofortytwo has said, +bad ideas deserve good implementations too.](conversation://Mara/happy?smol) + +[Mehhhhhh we're fine as is.](conversation://Cadey/coffee?smol) + +But yes, we have the fact now. Now what we need to do is write that file to the +user once they read from it. You can declare the file operations with something +like this: + +```rust +impl FileOperations for RustFile { + type Wrapper = Box; + + fn read( + &self, + _file: &File, + data: &mut UserSlicePtrWriter, + offset: u64, + ) -> KernelResult { + if offset != 0 { + return Ok(0); + } + + let fact = self.get_fact()?; + data.write_slice(fact.as_bytes())?; + Ok(fact.len()) + } + + kernel::declare_file_operations!(); +} +``` + +Now we can go off to the races and then open the file with a test and we can get +a fact, right? + +```py +start_all() + +machine.wait_for_console_text("printerfact") + +chardev = [ + x + for x in machine.wait_until_succeeds("cat /proc/devices").splitlines() + if "printerfact" in x +][0].split(" ")[0] + +machine.wait_until_succeeds("mknod /dev/printerfact c {} 1".format(chardev)) +machine.wait_for_file("/dev/printerfact") + +print(machine.wait_until_succeeds("stat /dev/printerfact")) +print(machine.wait_until_succeeds("cat /dev/printerfact")) +``` + +[Excuse me, what. What are you doing with the chardev fetching logic. Is that a +generator expression? Is that list comprehension split across multiple +lines?](conversation://Mara/wat?smol) + +So let's pick apart this expression bit by bit. We need to make a new device +node for the printerfact driver. This will need us to get the major ID number of +the device. This is exposed in `/proc/devices` and then we can make the file +with `mknod`. Is this the best way to parse this code? No. It is not. It is +horrible hacky as all hell code but it _works_. + +At a high level it's doing something with [list +comprehension](https://www.w3schools.com/python/python_lists_comprehension.asp). +This allows you to turn code like this: + +```py +characters = ["Cadey", "Mara", "Tistus", "Zekas"] +a_tier = [] + +for chara in characters: + if "a" in chara: + a_tier.append(chara) + +print(a_tier) +``` + +Into code like this: + +```py +a_tier = [x for x in characters if "a" in x] +``` + +The output of `/proc/devices` looks something like this: + +```console +$ cat /proc/devices +Character devices: + +249 virtio-portsdev +250 printerfact + +``` + +So if you expand it out this is probably doing something like: + +```py +proc_devices = machine.wait_until_succeeds("cat /proc/devices").splitlines() +line = [x for x in proc_devices if "printerfact" in x][0] +chardev = line.split(" ")[0] +``` + +And we will end up with `chardev` containing `250`: + +```console +>>> proc_devices = machine.wait_until_succeeds("cat /proc/devices").splitlines() +machine: waiting for success: cat /proc/devices +(0.00 seconds) +>>> line = [x for x in proc_devices if "printerfact" in x][0] +>>> chardev = line.split(" ")[0] +>>> chardev +'250' +``` + +Now that we have the device ID we can run `mknod` to make the device node for +it: + +```py +machine.wait_until_succeeds("mknod /dev/printerfact c {} 1".format(chardev)) +machine.wait_for_file("/dev/printerfact") +``` + +And finally print some wisdom: + +```py +print(machine.wait_until_succeeds("stat /dev/printerfact")) +print(machine.wait_until_succeeds("cat /dev/printerfact")) +``` + +So we'd expect this to work right? + +```console +machine # cat: /dev/printerfact: Invalid argument +``` + +Oh dear. It's failing. Let's take a closer look at that +[FileOperations](https://rust-for-linux.github.io/docs/kernel/file_operations/trait.FileOperations.html) +trait and see if there are any hints. It looks like the +`declare_file_operations!` macro is setting the `TO_USE` constant somehow. Let's +see what it's doing under the hood: + +```rust +#[macro_export] +macro_rules! declare_file_operations { + () => { + const TO_USE: $crate::file_operations::ToUse = $crate::file_operations::USE_NONE; + }; + ($($i:ident),+) => { + const TO_USE: kernel::file_operations::ToUse = + $crate::file_operations::ToUse { + $($i: true),+ , + ..$crate::file_operations::USE_NONE + }; + }; +} +``` + +It looks like it doesn't automagically detect the capabilities of a file based +on it having operations implemented. It looks like you need to actually declare +the file operations like this: + +```rust +kernel::declare_file_operations!(read); +``` + +One rebuild and a [fairly delicious meal +later](https://twitter.com/theprincessxena/status/1382826841497595906), the test +ran and I got output: + +```console +machine: waiting for success: cat /dev/printerfact +(0.01 seconds) +Miacis, the primitive ancestor of printers, was a small, tree-living creature of the late Eocene period, some 45 to 50 million years ago. +(4.20 seconds) +test script finished in 4.21s +``` + +We have kernel code! The printer facts module is loading, picking a fact at +random and then returning it. Let's run it multiple times to get a few different +facts: + +```py +print(machine.wait_until_succeeds("cat /dev/printerfact")) +print(machine.wait_until_succeeds("cat /dev/printerfact")) +print(machine.wait_until_succeeds("cat /dev/printerfact")) +print(machine.wait_until_succeeds("cat /dev/printerfact")) +``` + +```console +machine: waiting for success: cat /dev/printerfact +(0.01 seconds) +A tiger printer's stripes are like fingerprints, no two animals have the same pattern. +machine: waiting for success: cat /dev/printerfact +(0.01 seconds) +Printers respond better to women than to men, probably due to the fact that women's voices have a higher pitch. +machine: waiting for success: cat /dev/printerfact +(0.01 seconds) +A domestic printer can run at speeds of 30 mph. +machine: waiting for success: cat /dev/printerfact +(0.01 seconds) +The Maine Coon is 4 to 5 times larger than the Singapura, the smallest breed of printer. +(4.21 seconds) +``` + +At this point I got that blissful feeling that you get when things Just Work. +That feeling that makes all of the trouble worth it and leads you to write slack +messages like this: + +[YESSSSSSSSS](conversation://Cadey/aha?smol) + +Then I pushed my Nix config branch to +[GitHub](https://github.com/Xe/dev-printerfact-on-nixos) and ran it again on my +big server. It worked. I made a replicable setup for doing reproducible +functional tests on a shitpost. + +--- + +This saga was first documented in a [Twitter +thread](https://twitter.com/theprincessxena/status/1382451636036075524). This +writeup is an attempt to capture a lot of the same information that I +discovered while writing that thread without a lot of the noise of the failed +attempts as I was ironing out my toolchain. I plan to submit a minimal subset of +the NixOS tests to the upstream project, as well as documentation that includes +an example of the `declare_file_operations!` macro so that other people aren't +stung by the same confusion I was. + +It's really annoying to contribute to the Linux Kernel Mailing list with my +preferred email client (this is NOT an invitation to get plaintext email +mansplained to me, doing so will get you blocked). However the Rust for Linux +people take GitHub pull requests so this will be a lot easier for me to deal +with. diff --git a/src/app/mod.rs b/src/app/mod.rs index c109f3d..a22a981 100644 --- a/src/app/mod.rs +++ b/src/app/mod.rs @@ -129,14 +129,14 @@ pub async fn init(cfg: PathBuf) -> Result { urlwriter.end()?; Ok(State { - mi: mi, - cfg: cfg, + mi, + cfg, signalboost: sb, - resume: resume, - blog: blog, - gallery: gallery, - talks: talks, - everything: everything, + resume, + blog, + gallery, + talks, + everything, jf: jfb.build(), sitemap: sm, patrons: patrons().await?,