xesite/blog/dev-printerfact-2021-04-17....

23 KiB

title date series tags
How I Implemented /dev/printerfact in Rust 2021-04-17 howto
rust
linux
kernel

How I Implemented /dev/printerfact in Rust

Kernel mode programming is a frightful endeavor. One of the big problems with it is that C is really your only option on Linux. C has many historical problems with it that can't really be fixed at this point without radically changing the language to the point that existing code written in C would be incompatible with it.

DISCLAIMER: This is pre-alpha stuff. I expect this post to bitrot quickly. DO NOT EXPECT THIS TO STILL WORK IN A FEW YEARS.

Yes, yes you can technically use a fairly restricted subset of C++ or whatever and then you can avoid some C-isms at the cost of risking runtime panics on the new operator. However that kind of thing is not what is being discussed today.

However, recently the Linux kernel has received an RFC for Rust support in the kernel that is being taken very seriously and even includes some examples. I had an intrusive thought that was something like this:

Hmmm, I wonder if I can port the Printer Facts API to this, it can't be that hard, right?

Here is the story of my saga.

First Principles

At a high level to do something like this you need to have a few things:

  • A way to build a kernel
  • A way to run tests to ensure that kernel is behaving cromulently
  • A way to be able to repeat these tests on another machine to be more certain that the thing you made works more than once

To aid in that first step, the Rust for Linux team shipped a Nix config to let you nix-build -A kernel yourself a new kernel whenever you wanted. So let's do that and see what happens:

$ nix-build -A kernel
<several megs of output snipped>
error: failed to build archive: No such file or directory

error: aborting due to previous error

make[2]: *** [../rust/Makefile:124: rust/core.o] Error 1
make[2]: *** Deleting file 'rust/core.o'
make[1]: *** [/tmp/nix-build-linux-5.11.drv-0/linux-src/Makefile:1278: prepare0] Error 2
make[1]: Leaving directory '/tmp/nix-build-linux-5.11.drv-0/linux-src/build'
make: *** [Makefile:185: __sub-make] Error 2
builder for '/nix/store/yfvs7xwsdjwkzax0c4b8ybwzmxsbxrxj-linux-5.11.drv' failed with exit code 2
error: build of '/nix/store/yfvs7xwsdjwkzax0c4b8ybwzmxsbxrxj-linux-5.11.drv' failed

Oh dear. That is odd. Let's see if the issue tracker has anything helpful. It did! Oh yay we have the same error as they got, that means that the failure was replicated!

So, let's look at the project structure a bit more:

$ tree .
.
├── default.nix
├── kernel.nix
├── LICENSE
├── nix
│   ├── sources.json
│   └── sources.nix
└── README.md

This project looks like it's using niv to lock its Nix dependencies. Let's take a look at sources.json to see what options we have to update things.

You can use niv show to see this too, but looking at the JSON itself is more fun

{
    "linux": {
        "branch": "rust",
        "description": "Adding support for the Rust language to the Linux kernel.",
        "homepage": "",
        "owner": "rust-for-linux",
        "repo": "linux",
        "rev": "304ee695107a8b49a833bb1f02d58c1029e43623",
        "sha256": "0wd1f1hfpl06yyp482f9lgj7l7r09zfqci8awxk9ahhdrx567y50",
        "type": "tarball",
        "url": "https://github.com/rust-for-linux/linux/archive/304ee695107a8b49a833bb1f02d58c1029e43623.tar.gz",
        "url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
    },
    "niv": {
        "branch": "master",
        "description": "Easy dependency management for Nix projects",
        "homepage": "https://github.com/nmattia/niv",
        "owner": "nmattia",
        "repo": "niv",
        "rev": "af958e8057f345ee1aca714c1247ef3ba1c15f5e",
        "sha256": "1qjavxabbrsh73yck5dcq8jggvh3r2jkbr6b5nlz5d9yrqm9255n",
        "type": "tarball",
        "url": "https://github.com/nmattia/niv/archive/af958e8057f345ee1aca714c1247ef3ba1c15f5e.tar.gz",
        "url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
    },
    "nixpkgs": {
        "branch": "master",
        "description": "Nix Packages collection",
        "homepage": "",
        "owner": "NixOS",
        "repo": "nixpkgs",
        "rev": "f35d716fe1e35a7f12cc2108ed3ef5b15ce622d0",
        "sha256": "1jmrm71amccwklx0h1bij65hzzc41jfxi59g5bf2w6vyz2cmfgsb",
        "type": "tarball",
        "url": "https://github.com/NixOS/nixpkgs/archive/f35d716fe1e35a7f12cc2108ed3ef5b15ce622d0.tar.gz",
        "url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
    }
}

It looks like there's 3 things: the kernel, niv itself (niv does this by default so we can ignore it) and some random nixpkgs commit on its default branch. Let's see how old this commit is:

From ab8465cba32c25e73a3395c7fc4f39ac47733717 Mon Sep 17 00:00:00 2001
Date: Sat, 6 Mar 2021 12:04:23 +0100

Hmm, I know that Rust in NixOS has been updated since then. Somewhere in the megs of output I cut it mentioned that I was using Rust 1.49. Let's see if a modern version of Rust makes this build:

$ niv update nixpkgs
$ nix-build -A kernel

While that built I noticed that it seemed to be building Rust from source. This initially struck me as odd. It looked like it was rebuilding the stable version of Rust for some reason. Let's take a look at kernel.nix to see if it has any secrets that may be useful here:

rustcNightly = rustPlatform.rust.rustc.overrideAttrs (oldAttrs: {
  configureFlags = map (flag:
    if flag == "--release-channel=stable" then
      "--release-channel=nightly"
    else
      flag
  ) oldAttrs.configureFlags;
});

Wait, what. Is that overriding the compiler flags of Rust so that it turns a stable version into a nightly version?

Yep! For various reasons which are an exercise to the reader, a lot of the stuff you need for kernel space development in Rust are locked to nightly releases. Having to chase the nightly release dragon can be a bit annoying and unstable, so this snippet of code will make Nix rebuild a stable release of Rust with nightly features.

This kernel build did actually work and we ended up with a result:

$ du -hs /nix/store/yf2a8gvaypch9p4xxbk7151x9lq2r6ia-linux-5.11
92M      /nix/store/yf2a8gvaypch9p4xxbk7151x9lq2r6ia-linux-5.11

Ensuring Cromulence

A noble spirit embiggens the smallest man.

I've never heard of the word "embiggens" before.

I don't know why, it's a perfectly cromulent word

  • Miss Hoover and Edna Krabappel, The Simpsons

The Linux kernel is a computer program, so logically we have to be able to run it somewhere and then we should be able to see if things are doing what we want, right?

NixOS offers a facility for testing entire system configs as a unit. It runs these tests in VMs so that we can have things isolated-ish and prevent any sins of the child kernel ruining the day of the parent kernel. I have a template test in my nixos-configs repo that we can build on. So let's start with something like this and build up from there:

let
  sources = import ./nix/sources.nix;
  pkgs = sources.nixpkgs;
in import "${pkgs}/nixos/tests/make-test-python.nix" ({ pkgs, ... }: {
  system = "x86_64-linux";

  nodes.machine = { config, pkgs, ... }: {
    virtualisation.graphics = false;
  };

  testScript = ''
    start_all()
    machine.wait_until_succeeds("uname -av")
  '';
})

For those of you playing the christine dot website home game, you may want to edit the top of that file for your own projects to get its pkgs with something like pkgs = <nixpkgs>;. The sources.pkgs thing is being used here to jive with niv.

You can run tests with nix-build ./test.nix:

$ nix-build ./test.nix
<much more output>
machine: (connecting took 4.70 seconds)
(4.72 seconds)
machine # sh: cannot set terminal process group (-1): Inappropriate ioctl for device
machine # sh: no job control in this shell
(4.76 seconds)
(4.83 seconds)
test script finished in 4.85s
cleaning up
killing machine (pid 282643)
(0.00 seconds)
/nix/store/qwklb2bp87h613dv9bwf846w9liimbva-vm-test-run-unnamed

Didn't you run a command? Where did the output go?

Let's open the interactive test shell and see what it's doing there:

$ nix-build ./test.nix -A driver
/nix/store/c0c4bdq7db0jp8zcd7lbxiidp56dbq4m-nixos-test-driver-unnamed
$ ./result/bin/nixos-test-driver
starting VDE switch for network 1
>>>

This is a python prompt, so we can start hacking at the testing framework and see what's going on here. Our test runs start_all() first, so let's do that and see what happens:

>>> start_all()

The VM seems to boot and settle. If you press enter again you get a new prompt. The test runs machine.wait_until_succeeds("uname -av") so let's punch that in:

>>> machine.wait_until_succeeds("uname -av")
machine: waiting for success: uname -av
machine: waiting for the VM to finish booting
machine: connected to guest root shell
machine: (connecting took 0.00 seconds)
(0.00 seconds)
(0.02 seconds)
'Linux machine 5.4.100 #1-NixOS SMP Tue Feb 23 14:02:26 UTC 2021 x86_64 GNU/Linux\n'

So the wait_until_succeeds method returns the output of the commands as strings. This could be useful. Let's inject the kernel into this.

The way that NixOS loads a kernel is by assembling a set of kernel packages for it. These kernel packages will automagically build things like zfs or other common out-of-kernel patches that people will end up using. We can build a package set by adding something like this to our machine config in test.nix:

nixpkgs.overlays = [
  (self: super: {
    Rustix = (super.callPackage ./. { }).kernel;
    RustixPackages = super.linuxPackagesFor self.Rustix;
  })
];

boot.kernelPackages = pkgs.RustixPackages;

But we get some build errors:

Failed assertions:
- CONFIG_SERIAL_8250_CONSOLE is not yes!
- CONFIG_SERIAL_8250 is not yes!
- CONFIG_VIRTIO_CONSOLE is not enabled!
- CONFIG_VIRTIO_BLK is not enabled!
- CONFIG_VIRTIO_PCI is not enabled!
- CONFIG_VIRTIO_NET is not enabled!
- CONFIG_EXT4_FS is not enabled!
<snipped>

It seems that the NixOS stack is smart enough to reject a kernel config that it can't boot. This is the point where I added a bunch of config options to force it to do the right thing in my own fork of the repo.

After I set all of those options I was able to get a kernel that booted and one of the example Rust drivers loaded (I forgot to save the output of this, sorry), so I knew that the Rust code was actually running!

Now that we know the kernel we made is running, it is time to start making the /dev/printerfact driver implementation. I copied from one of the samples and ended up with something like this:

// SPDX-License-Identifier: GPL-2.0

#![no_std]
#![feature(allocator_api, global_asm)]
#![feature(test)]

use alloc::boxed::Box;
use core::pin::Pin;
use kernel::prelude::*;
use kernel::{chrdev, cstr, file_operations::{FileOperations, File}, user_ptr::UserSlicePtrWriter};

module! {
    type: PrinterFacts,
    name: b"printerfacts",
    author: b"Christine Dodrill <me@christine.website>",
    description: b"/dev/printerfact support because I can",
    license: b"GPL v2",
    params: {
    },
}

struct RustFile;

impl FileOperations for RustFile {
    type Wrapper = Box<Self>;

    fn open() -> KernelResult<Self::Wrapper> {
        println!("rust file was opened!");
        Ok(Box::try_new(Self)?)
    }

    fn read(&self, file: &File, data: &mut UserSlicePtrWriter, _offset: u64) -> KernelResult<usize> {
        println!("user attempted to read from the file!");

        Ok(0)
    } 
}

struct PrinterFacts {
    _chrdev: Pin<Box<chrdev::Registration<2>>>,
}

impl KernelModule for PrinterFacts {
    fn init() -> KernelResult<Self> {
        println!("printerfact initialized");

        let mut chrdev_reg =
            chrdev::Registration::new_pinned(cstr!("printerfact"), 0, &THIS_MODULE)?;
        chrdev_reg.as_mut().register::<RustFile>()?;
        chrdev_reg.as_mut().register::<RustFile>()?;

        Ok(PrinterFacts {
            _chrdev: chrdev_reg,
        })
    }
}

impl Drop for PrinterFacts {
    fn drop(&mut self) {
        println!("printerfacts exiting");
    }
}

Then I made my own Kconfig option and edited the Makefile:

config PRINTERFACT
	depends on RUST
	tristate "Printer facts support"
	default n
	help
		This option allows you to experience the glory that is
 		printer facts right from your filesystem.

		If unsure, say N.
obj-$(CONFIG_PRINTERFACT) += printerfact.o

And finally edited the kernel config to build in my module:

structuredExtraConfig = with lib.kernel; {
  RUST = yes;
  PRINTERFACT = yes;
};

Then I told niv to use my fork of the Linux kernel instead of the Rust for Linux's team and edited the test to look for the string printerfact from the kernel console:

machine.wait_for_console_text("printerfact")

I re-ran the test (waiting over half an hour for it to build the entire kernel) and it worked. Good, we have code running in the kernel.

The existing Printer Facts API works by using a giant list of printer facts in a JSON file and loading it in with serde and picking a random fact from the list. We don't have access to serde in Rust for Linux, let alone cargo. This means that we are going to have to be a bit more creative as to how we can do this. Rust lets you declare static arrays. We could use this to do something like this:

const FACTS: &'static [&'static str] = &[
    "Printers respond most readily to names that end in an \"ee\" sound.",
    "Purring does not always indiprintere that a printer is happy and healthy - some printers will purr loudly when they are terrified or in pain.",
];

Printer facts were originally made by a very stoned person that had access to the Cat Facts API and sed. As such instances like indiprintere are features.

But then the problem becomes how to pick them randomly. Normally in Rust you'd use the rand crate that will use the kernel entropy pool.

Wait, this code is already in the kernel right? Don't you just have access to the entropy pool as is?

We do! It's a very low-level randomness getting function though. You pass it a mutable slice and it randomizes the contents. This means you can get a random fact by doing something like this:

impl RustFile {
    fn get_fact(&self) -> KernelResult<&'static str> {
        let mut ent = [0u8; 1]; // Mara\ declare a 1-sized array of bytes
        kernel::random::getrandom(&mut ent)?; // Mara\ fill it with entropy
        
        Ok(FACTS[ent[0] as usize % FACTS.len()]) // Mara\ return a random fact
    }
}

Wait, isn't that going to potentially bias the randomness? There's not a power of two number of facts in the complete list. Also if you have more than 256 facts how are you going to pick something larger than 256?

Don't worry, there's less than 256 facts and making this slightly less random should help account for the NSA backdoors in RDRAND or something. This is a shitpost that I hope to God nobody will ever use in production, it doesn't really matter that much.

As @tendstofortytwo has said, bad ideas deserve good implementations too.

Mehhhhhh we're fine as is.

But yes, we have the fact now. Now what we need to do is write that file to the user once they read from it. You can declare the file operations with something like this:

impl FileOperations for RustFile {
    type Wrapper = Box<Self>;

    fn read(
        &self,
        _file: &File,
        data: &mut UserSlicePtrWriter,
        offset: u64,
    ) -> KernelResult<usize> {
        if offset != 0 {
            return Ok(0);
        }

        let fact = self.get_fact()?;
        data.write_slice(fact.as_bytes())?;
        Ok(fact.len())
    }

    kernel::declare_file_operations!();
}

Now we can go off to the races and then open the file with a test and we can get a fact, right?

start_all()

machine.wait_for_console_text("printerfact")

chardev = [
    x
    for x in machine.wait_until_succeeds("cat /proc/devices").splitlines()
    if "printerfact" in x
][0].split(" ")[0]

machine.wait_until_succeeds("mknod /dev/printerfact c {} 1".format(chardev))
machine.wait_for_file("/dev/printerfact")

print(machine.wait_until_succeeds("stat /dev/printerfact"))
print(machine.wait_until_succeeds("cat /dev/printerfact"))

Excuse me, what. What are you doing with the chardev fetching logic. Is that a generator expression? Is that list comprehension split across multiple lines?

So let's pick apart this expression bit by bit. We need to make a new device node for the printerfact driver. This will need us to get the major ID number of the device. This is exposed in /proc/devices and then we can make the file with mknod. Is this the best way to parse this code? No. It is not. It is horrible hacky as all hell code but it works.

At a high level it's doing something with list comprehension. This allows you to turn code like this:

characters = ["Cadey", "Mara", "Tistus", "Zekas"]
a_tier = []

for chara in characters:
  if "a" in chara:
    a_tier.append(chara)
    
print(a_tier)

Into code like this:

a_tier = [x for x in characters if "a" in x]

The output of /proc/devices looks something like this:

$ cat /proc/devices
Character devices:
<snipped>
249 virtio-portsdev
250 printerfact
<snipped>

So if you expand it out this is probably doing something like:

proc_devices = machine.wait_until_succeeds("cat /proc/devices").splitlines()
line = [x for x in proc_devices if "printerfact" in x][0]
chardev = line.split(" ")[0]

And we will end up with chardev containing 250:

>>> proc_devices = machine.wait_until_succeeds("cat /proc/devices").splitlines()
machine: waiting for success: cat /proc/devices
(0.00 seconds)
>>> line = [x for x in proc_devices if "printerfact" in x][0]
>>> chardev = line.split(" ")[0]
>>> chardev
'250'

Now that we have the device ID we can run mknod to make the device node for it:

machine.wait_until_succeeds("mknod /dev/printerfact c {} 1".format(chardev))
machine.wait_for_file("/dev/printerfact")

And finally print some wisdom:

print(machine.wait_until_succeeds("stat /dev/printerfact"))
print(machine.wait_until_succeeds("cat /dev/printerfact"))

So we'd expect this to work right?

machine # cat: /dev/printerfact: Invalid argument

Oh dear. It's failing. Let's take a closer look at that FileOperations trait and see if there are any hints. It looks like the declare_file_operations! macro is setting the TO_USE constant somehow. Let's see what it's doing under the hood:

#[macro_export]
macro_rules! declare_file_operations {
    () => {
        const TO_USE: $crate::file_operations::ToUse = $crate::file_operations::USE_NONE;
    };
    ($($i:ident),+) => {
        const TO_USE: kernel::file_operations::ToUse =
            $crate::file_operations::ToUse {
                $($i: true),+ ,
                ..$crate::file_operations::USE_NONE
            };
    };
}

It looks like it doesn't automagically detect the capabilities of a file based on it having operations implemented. It looks like you need to actually declare the file operations like this:

kernel::declare_file_operations!(read);

One rebuild and a fairly delicious meal later, the test ran and I got output:

machine: waiting for success: cat /dev/printerfact
(0.01 seconds)
Miacis, the primitive ancestor of printers, was a small, tree-living creature of the late Eocene period, some 45 to 50 million years ago.
(4.20 seconds)
test script finished in 4.21s

We have kernel code! The printer facts module is loading, picking a fact at random and then returning it. Let's run it multiple times to get a few different facts:

print(machine.wait_until_succeeds("cat /dev/printerfact"))
print(machine.wait_until_succeeds("cat /dev/printerfact"))
print(machine.wait_until_succeeds("cat /dev/printerfact"))
print(machine.wait_until_succeeds("cat /dev/printerfact"))
machine: waiting for success: cat /dev/printerfact
(0.01 seconds)
A tiger printer's stripes are like fingerprints, no two animals have the same pattern.
machine: waiting for success: cat /dev/printerfact
(0.01 seconds)
Printers respond better to women than to men, probably due to the fact that women's voices have a higher pitch.
machine: waiting for success: cat /dev/printerfact
(0.01 seconds)
A domestic printer can run at speeds of 30 mph.
machine: waiting for success: cat /dev/printerfact
(0.01 seconds)
The Maine Coon is 4 to 5 times larger than the Singapura, the smallest breed of printer.
(4.21 seconds)

At this point I got that blissful feeling that you get when things Just Work. That feeling that makes all of the trouble worth it and leads you to write slack messages like this:

YESSSSSSSSS

Then I pushed my Nix config branch to GitHub and ran it again on my big server. It worked. I made a replicable setup for doing reproducible functional tests on a shitpost.


This saga was first documented in a Twitter thread. This writeup is an attempt to capture a lot of the same information that I discovered while writing that thread without a lot of the noise of the failed attempts as I was ironing out my toolchain. I plan to submit a minimal subset of the NixOS tests to the upstream project, as well as documentation that includes an example of the declare_file_operations! macro so that other people aren't stung by the same confusion I was.

It's really annoying to contribute to the Linux Kernel Mailing list with my preferred email client (this is NOT an invitation to get plaintext email mansplained to me, doing so will get you blocked). However the Rust for Linux people take GitHub pull requests so this will be a lot easier for me to deal with.