From 3dba1d98f8f45daf5bb399365b24b493c6e98609 Mon Sep 17 00:00:00 2001 From: Christine Dodrill Date: Wed, 20 Jan 2021 16:42:05 -0500 Subject: [PATCH] nixos encrypted secret post/essay Signed-off-by: Christine Dodrill --- ...ixos-encrypted-secrets-2021-01-20.markdown | 333 ++++++++++++++++++ 1 file changed, 333 insertions(+) create mode 100644 blog/nixos-encrypted-secrets-2021-01-20.markdown diff --git a/blog/nixos-encrypted-secrets-2021-01-20.markdown b/blog/nixos-encrypted-secrets-2021-01-20.markdown new file mode 100644 index 0000000..bb1c863 --- /dev/null +++ b/blog/nixos-encrypted-secrets-2021-01-20.markdown @@ -0,0 +1,333 @@ +--- +title: Encrypted Secrets with NixOS +date: 2021-01-20 +series: nixos +tags: + - age + - ed25519 +--- + +# Encrypted Secrets with NixOS + +One of the best things about NixOS is the fact that it's so easy to do +configuration management using it. The Nix store (where all your packages live) +has a huge flaw for secret management though: everything in the Nix store is +globally readable. This means that anyone logged into or running code on the +system could read any secret in the Nix store without any limits. This is +sub-optimal if your goal is to keep secret values secret. There have been a few +approaches to this over the years, but I want to describe how I'm doing it. +Here are my goals and implementation for this setup and how a few other secret +management strategies don't quite pan out. + +At a high level I have these goals: + +* It should be trivial to declare new secrets +* Secrets should never be globally readable in any useful form +* If I restart the machine, I should not need to take manual human action to + ensure all of the services come back online +* GPG should be avoided at all costs + +As a side goal being able to roll back secret changes would also be nice. + +The two biggest tools that offer a way to help with secret management on NixOS +that come to mind are NixOps and Morph. + +[NixOps](https://github.com/NixOS/nixops) is a tool that helps administrators +operate NixOS across multiple servers at once. I use NixOps extensively in my +own setup. It calls deployment secrets "keys" and they are documented +[here](https://hydra.nixos.org/build/115931128/download/1/manual/manual.html#idm140737322649152). +At a high level they are declared like this: + +```nix +deployment.keys.example = { + text = "this is a super sekrit value :)"; + user = "example"; + group = "keys"; + permissions = "0400"; +}; +``` + +This will create a new secret in `/run/keys` that will contain our super secret +value. + +[Wait, isn't `/run` an ephemeral filesystem? What happens when the system +reboots?](conversation://Mara/hmm) + +Let's make an example system and find out! So let's say we have that `example` +secret from earlier and want to use it in a job. The job definition could look +something like this: + +```nix +# create a service-specific user +users.users.example.isSystemUser = true; + +# without this group the secret can't be read +users.users.example.extraGroups = [ "keys" ]; + +systemd.services.example = { + wantedBy = [ "multi-user.target" ]; + after = [ "example-key.service" ]; + wants = [ "example-key.service" ]; + + serviceConfig.User = "example"; + serviceConfig.Type = "oneshot"; + + script = '' + stat /run/keys/example + ''; +}; +``` + +This creates a user called `example` and gives it permission to read deployment +keys. It also creates a systemd service called `example.service` and runs +[`id(1)`](https://linux.die.net/man/1/id) +[`stat(1)`](https://linux.die.net/man/1/stat) to show the permissions of the +service and the key file. It also runs as our `example` user. To avoid systemd +thinking our service failed, we're also going to mark it as a +[oneshot](https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files#the-service-section). + +Altogether it could look something like +[this](https://gist.github.com/Xe/4a71d7741e508d9002be91b62248144a). Let's see +what `systemctl` has to report: + +```console +$ nixops ssh -d blog-example pa -- systemctl status example +● example.service + Loaded: loaded (/nix/store/j4a8f6mnaw3v4sz7dqlnz95psh72xglw-unit-example.service/example.service; enabled; vendor preset: enabled) + Active: inactive (dead) since Wed 2021-01-20 20:53:54 UTC; 37s ago + Process: 2230 ExecStart=/nix/store/1yg89z4dsdp1axacqk07iq5jqv58q169-unit-script-example-start/bin/example-start (code=exited, status=0/SUCCESS) + Main PID: 2230 (code=exited, status=0/SUCCESS) + IP: 0B in, 0B out + CPU: 3ms + +Jan 20 20:53:54 pa example-start[2235]: File: /run/keys/example +Jan 20 20:53:54 pa example-start[2235]: Size: 31 Blocks: 8 IO Block: 4096 regular file +Jan 20 20:53:54 pa example-start[2235]: Device: 18h/24d Inode: 37428 Links: 1 +Jan 20 20:53:54 pa example-start[2235]: Access: (0400/-r--------) Uid: ( 998/ example) Gid: ( 96/ keys) +Jan 20 20:53:54 pa example-start[2235]: Access: 2021-01-20 20:53:54.010554201 +0000 +Jan 20 20:53:54 pa example-start[2235]: Modify: 2021-01-20 20:53:54.010554201 +0000 +Jan 20 20:53:54 pa example-start[2235]: Change: 2021-01-20 20:53:54.398103181 +0000 +Jan 20 20:53:54 pa example-start[2235]: Birth: - +Jan 20 20:53:54 pa systemd[1]: example.service: Succeeded. +Jan 20 20:53:54 pa systemd[1]: Finished example.service. +``` + +So what happens when we reboot? I'll force a reboot in my hypervisor and we'll +find out: + +```console +$ nixops ssh -d blog-example pa -- systemctl status example +● example.service + Loaded: loaded (/nix/store/j4a8f6mnaw3v4sz7dqlnz95psh72xglw-unit-example.service/example.service; enabled; vendor preset: enabled) + Active: inactive (dead) +``` + +The service is inactive. Let's see what the status of `example-key.service` is: + +```console +$ nixops ssh -d blog-example pa -- systemctl status example-key +● example-key.service + Loaded: loaded (/nix/store/ikqn64cjq8pspkf3ma1jmx8qzpyrckpb-unit-example-key.service/example-key.service; linked; vendor preset: enabled) + Active: activating (start-pre) since Wed 2021-01-20 20:56:05 UTC; 3min 1s ago +Cntrl PID: 610 (example-key-pre) + IP: 0B in, 0B out + IO: 116.0K read, 0B written + Tasks: 4 (limit: 2374) + Memory: 1.6M + CPU: 3ms + CGroup: /system.slice/example-key.service + ├─610 /nix/store/kl6lr3czkbnr6m5crcy8ffwfzbj8a22i-bash-4.4-p23/bin/bash -e /nix/store/awx1zrics3cal8kd9c5d05xzp5ikazlk-unit-script-example-key-pre-start/bin/example-key-pre-start + ├─619 /nix/store/kl6lr3czkbnr6m5crcy8ffwfzbj8a22i-bash-4.4-p23/bin/bash -e /nix/store/awx1zrics3cal8kd9c5d05xzp5ikazlk-unit-script-example-key-pre-start/bin/example-key-pre-start + ├─620 /nix/store/kl6lr3czkbnr6m5crcy8ffwfzbj8a22i-bash-4.4-p23/bin/bash -e /nix/store/awx1zrics3cal8kd9c5d05xzp5ikazlk-unit-script-example-key-pre-start/bin/example-key-pre-start + └─621 inotifywait -qm --format %f -e create,move /run/keys + +Jan 20 20:56:05 pa systemd[1]: Starting example-key.service... +``` + +The service is blocked waiting for the keys to exist. We have to populate the +keys with `nixops send-keys`: + +```console +$ nixops send-keys -d blog-example +pa> uploading key ‘example’... +``` + +Now when we check on `example.service`, we get the following: + +```console +$ nixops ssh -d blog-example pa -- systemctl status example +● example.service + Loaded: loaded (/nix/store/j4a8f6mnaw3v4sz7dqlnz95psh72xglw-unit-example.service/example.service; enabled; vendor preset: enabled) + Active: inactive (dead) since Wed 2021-01-20 21:00:24 UTC; 32s ago + Process: 954 ExecStart=/nix/store/1yg89z4dsdp1axacqk07iq5jqv58q169-unit-script-example-start/bin/example-start (code=exited, status=0/SUCCESS) + Main PID: 954 (code=exited, status=0/SUCCESS) + IP: 0B in, 0B out + CPU: 3ms + +Jan 20 21:00:24 pa example-start[957]: File: /run/keys/example +Jan 20 21:00:24 pa example-start[957]: Size: 31 Blocks: 8 IO Block: 4096 regular file +Jan 20 21:00:24 pa example-start[957]: Device: 18h/24d Inode: 27774 Links: 1 +Jan 20 21:00:24 pa example-start[957]: Access: (0400/-r--------) Uid: ( 998/ example) Gid: ( 96/ keys) +Jan 20 21:00:24 pa example-start[957]: Access: 2021-01-20 21:00:24.588494730 +0000 +Jan 20 21:00:24 pa example-start[957]: Modify: 2021-01-20 21:00:24.588494730 +0000 +Jan 20 21:00:24 pa example-start[957]: Change: 2021-01-20 21:00:24.606495751 +0000 +Jan 20 21:00:24 pa example-start[957]: Birth: - +Jan 20 21:00:24 pa systemd[1]: example.service: Succeeded. +Jan 20 21:00:24 pa systemd[1]: Finished example.service. +``` + +This means that NixOps secrets require _manual human intervention_ in order to +repopulate them on server boot. If your server went offline overnight due to an +unexpected issue, your services using those keys could be stuck offline until +morning. This is undesirable for a number of reasons. This plus the requirement +for the `keys` group (which at time of writing was undocumented) to be added to +service user accounts means that while they do work, they are not very +ergonomic. + +[You can read secrets from files using something like +`deployment.keys.example.text = "${builtins.readFile ./secrets/example.env}"`, +but it is kind of a pain to have to do that. It would be better to just +reference the secrets by filesystem paths in the first +place.](conversation://Mara/hacker) + +On the other hand [Morph](https://github.com/DBCDK/morph) gets this a bit +better. It is sadly even less documented than NixOps is, but it offers a similar +experience via [deployment +secrets](https://github.com/DBCDK/morph/blob/master/examples/secrets.nix). The +main differences that Morph brings to the table are taking paths to secrets and +allowing you to run an arbitrary command on the secret being uploaded. Secrets +are also able to be put anywhere on the disk, meaning that when a host reboots it +will come back up with the most recent secrets uploaded to it. + +However, like NixOps, Morph secrets don't have the ability to be rolled back. +This means that if you mess up a secret value you better hope you have the old +information somewhere. This violates what you'd expect from a NixOS machine. + +So given these examples, I thought it would be interesting to explore what the +middle path could look like. I chose to use +[age](https://github.com/FiloSottile/age) for encrypting secrets in the Nix +store as well as using SSH host keys to ensure that every secret is decryptable +at runtime by _that machine only_. If you get your hands on the secret +cyphertext, it should be unusable to you. + +One of the harder things here will be keeping a list of all of the server host +keys. Recently I added a +[hosts.toml](https://github.com/Xe/nixos-configs/blob/master/ops/metadata/hosts.toml) +file to my config repo for autoconfiguring my WireGuard overlay network. It was +easy enough to add all the SSH host keys for each machine using a command like +this to get them: + +[We will cover how this WireGuard overlay works in a future post.](conversation://Mara/hacker) + +```console +$ nixops ssh-for-each -d hexagone -- cat /etc/ssh/ssh_host_ed25519_key.pub +firgu....> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIB8+mCR+MEsv0XYi7ohvdKLbDecBtb3uKGQOPfIhdj3C root@nixos +chrysalis> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGDA5iXvkKyvAiMEd/5IruwKwoymC8WxH4tLcLWOSYJ1 root@chrysalis +lufta....> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMADhGV0hKt3ZY+uBjgOXX08txBS6MmHZcSL61KAd3df root@lufta +keanu....> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGDZUmuhfjEIROo2hog2c8J53taRuPJLNOtdaT8Nt69W root@nixos +``` + +age lets you use SSH keys for decryption, so I added these keys to my +`hosts.toml` and ended up with something like +[this](https://github.com/Xe/nixos-configs/commit/14726e982001e794cd72afa1ece209eed58d3f38#diff-61d1d8dddd71be624c0d718be22072c950ec31c72fded8a25094ea53d94c8185). + +Now we can encrypt secrets on the host machine and safely put them in the Nix +store because they will be readable to each target machine with a command like +this: + +```shell +age -d -i /etc/ssh/ssh_host_ed25519_key -o $dest $src +``` + +From here it's easy to make a function that we can use for generating new +encrypted secrets in the Nix store. First we need to import the host metadata +from the toml file: + +```nix +let + cfg = config.within.secrets; + metadata = lib.importTOML ../../ops/metadata/hosts.toml; + + mkSecretOnDisk = name: + { source, ... }: + pkgs.stdenv.mkDerivation { + name = "${name}-secret"; + phases = "installPhase"; + buildInputs = [ pkgs.age ]; + installPhase = + let key = metadata.hosts."${config.networking.hostName}".ssh_pubkey; + in '' + age -a -r "${key}" -o $out ${source} + ''; + }; +``` + +And then we can generate systemd oneshot jobs with something like this: + +```nix + mkService = name: + { source, dest, owner, group, permissions, ... }: { + description = "decrypt secret for ${name}"; + wantedBy = [ "multi-user.target" ]; + + serviceConfig.Type = "oneshot"; + + script = with pkgs; '' + rm -rf ${dest} + ${age}/bin/age -d -i /etc/ssh/ssh_host_ed25519_key -o ${dest} ${ + mkSecretOnDisk name { inherit source; } + } + + chown ${owner}:${group} ${dest} + chmod ${permissions} ${dest} + ''; + }; +``` + +And from there we just need some [boring +boilerplate](https://github.com/Xe/nixos-configs/blob/master/common/crypto/default.nix#L8-L38) +to define a secret type. Then we declare the secret type and its invocation: + +```nix +in { + options.within.secrets = mkOption { + type = types.attrsOf secret; + description = "secret configuration"; + default = { }; + }; + + config.systemd.services = let + units = mapAttrs' (name: info: { + name = "${name}-key"; + value = (mkService name info); + }) cfg; + in units; +} +``` + +And we have ourself a NixOS module that allows us to: + +* Trivially declare new secrets +* Make secrets in the Nix store useless without the key +* Make every secret be transparently decrypted on startup +* Avoid the use of GPG +* Roll back secrets like any other configuration change + +Declaring new secrets works like this (as stolen from [the service definition +for the website you are reading right now](https://github.com/Xe/nixos-configs/blob/master/common/services/xesite.nix#L35-L41)): + +```nix +within.secrets.example = { + source = ./secrets/example.env; + dest = "/var/lib/example/.env"; + owner = "example"; + group = "nogroup"; + permissions = "0400"; +}; +``` + +Barring some kind of cryptographic attack against age, this should allow the +secrets to be stored securely. I am working on a way to make this more generic. +This overall approach was inspired by [agenix](https://github.com/ryantm/agenix) +but made more specific for my needs. I hope this approach will make it easy for +me to manage these secrets in the future.