nixos encrypted secret post/essay
Signed-off-by: Christine Dodrill <me@christine.website>
This commit is contained in:
parent
90332b323d
commit
3dba1d98f8
|
@ -0,0 +1,333 @@
|
|||
---
|
||||
title: Encrypted Secrets with NixOS
|
||||
date: 2021-01-20
|
||||
series: nixos
|
||||
tags:
|
||||
- age
|
||||
- ed25519
|
||||
---
|
||||
|
||||
# Encrypted Secrets with NixOS
|
||||
|
||||
One of the best things about NixOS is the fact that it's so easy to do
|
||||
configuration management using it. The Nix store (where all your packages live)
|
||||
has a huge flaw for secret management though: everything in the Nix store is
|
||||
globally readable. This means that anyone logged into or running code on the
|
||||
system could read any secret in the Nix store without any limits. This is
|
||||
sub-optimal if your goal is to keep secret values secret. There have been a few
|
||||
approaches to this over the years, but I want to describe how I'm doing it.
|
||||
Here are my goals and implementation for this setup and how a few other secret
|
||||
management strategies don't quite pan out.
|
||||
|
||||
At a high level I have these goals:
|
||||
|
||||
* It should be trivial to declare new secrets
|
||||
* Secrets should never be globally readable in any useful form
|
||||
* If I restart the machine, I should not need to take manual human action to
|
||||
ensure all of the services come back online
|
||||
* GPG should be avoided at all costs
|
||||
|
||||
As a side goal being able to roll back secret changes would also be nice.
|
||||
|
||||
The two biggest tools that offer a way to help with secret management on NixOS
|
||||
that come to mind are NixOps and Morph.
|
||||
|
||||
[NixOps](https://github.com/NixOS/nixops) is a tool that helps administrators
|
||||
operate NixOS across multiple servers at once. I use NixOps extensively in my
|
||||
own setup. It calls deployment secrets "keys" and they are documented
|
||||
[here](https://hydra.nixos.org/build/115931128/download/1/manual/manual.html#idm140737322649152).
|
||||
At a high level they are declared like this:
|
||||
|
||||
```nix
|
||||
deployment.keys.example = {
|
||||
text = "this is a super sekrit value :)";
|
||||
user = "example";
|
||||
group = "keys";
|
||||
permissions = "0400";
|
||||
};
|
||||
```
|
||||
|
||||
This will create a new secret in `/run/keys` that will contain our super secret
|
||||
value.
|
||||
|
||||
[Wait, isn't `/run` an ephemeral filesystem? What happens when the system
|
||||
reboots?](conversation://Mara/hmm)
|
||||
|
||||
Let's make an example system and find out! So let's say we have that `example`
|
||||
secret from earlier and want to use it in a job. The job definition could look
|
||||
something like this:
|
||||
|
||||
```nix
|
||||
# create a service-specific user
|
||||
users.users.example.isSystemUser = true;
|
||||
|
||||
# without this group the secret can't be read
|
||||
users.users.example.extraGroups = [ "keys" ];
|
||||
|
||||
systemd.services.example = {
|
||||
wantedBy = [ "multi-user.target" ];
|
||||
after = [ "example-key.service" ];
|
||||
wants = [ "example-key.service" ];
|
||||
|
||||
serviceConfig.User = "example";
|
||||
serviceConfig.Type = "oneshot";
|
||||
|
||||
script = ''
|
||||
stat /run/keys/example
|
||||
'';
|
||||
};
|
||||
```
|
||||
|
||||
This creates a user called `example` and gives it permission to read deployment
|
||||
keys. It also creates a systemd service called `example.service` and runs
|
||||
[`id(1)`](https://linux.die.net/man/1/id)
|
||||
[`stat(1)`](https://linux.die.net/man/1/stat) to show the permissions of the
|
||||
service and the key file. It also runs as our `example` user. To avoid systemd
|
||||
thinking our service failed, we're also going to mark it as a
|
||||
[oneshot](https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files#the-service-section).
|
||||
|
||||
Altogether it could look something like
|
||||
[this](https://gist.github.com/Xe/4a71d7741e508d9002be91b62248144a). Let's see
|
||||
what `systemctl` has to report:
|
||||
|
||||
```console
|
||||
$ nixops ssh -d blog-example pa -- systemctl status example
|
||||
● example.service
|
||||
Loaded: loaded (/nix/store/j4a8f6mnaw3v4sz7dqlnz95psh72xglw-unit-example.service/example.service; enabled; vendor preset: enabled)
|
||||
Active: inactive (dead) since Wed 2021-01-20 20:53:54 UTC; 37s ago
|
||||
Process: 2230 ExecStart=/nix/store/1yg89z4dsdp1axacqk07iq5jqv58q169-unit-script-example-start/bin/example-start (code=exited, status=0/SUCCESS)
|
||||
Main PID: 2230 (code=exited, status=0/SUCCESS)
|
||||
IP: 0B in, 0B out
|
||||
CPU: 3ms
|
||||
|
||||
Jan 20 20:53:54 pa example-start[2235]: File: /run/keys/example
|
||||
Jan 20 20:53:54 pa example-start[2235]: Size: 31 Blocks: 8 IO Block: 4096 regular file
|
||||
Jan 20 20:53:54 pa example-start[2235]: Device: 18h/24d Inode: 37428 Links: 1
|
||||
Jan 20 20:53:54 pa example-start[2235]: Access: (0400/-r--------) Uid: ( 998/ example) Gid: ( 96/ keys)
|
||||
Jan 20 20:53:54 pa example-start[2235]: Access: 2021-01-20 20:53:54.010554201 +0000
|
||||
Jan 20 20:53:54 pa example-start[2235]: Modify: 2021-01-20 20:53:54.010554201 +0000
|
||||
Jan 20 20:53:54 pa example-start[2235]: Change: 2021-01-20 20:53:54.398103181 +0000
|
||||
Jan 20 20:53:54 pa example-start[2235]: Birth: -
|
||||
Jan 20 20:53:54 pa systemd[1]: example.service: Succeeded.
|
||||
Jan 20 20:53:54 pa systemd[1]: Finished example.service.
|
||||
```
|
||||
|
||||
So what happens when we reboot? I'll force a reboot in my hypervisor and we'll
|
||||
find out:
|
||||
|
||||
```console
|
||||
$ nixops ssh -d blog-example pa -- systemctl status example
|
||||
● example.service
|
||||
Loaded: loaded (/nix/store/j4a8f6mnaw3v4sz7dqlnz95psh72xglw-unit-example.service/example.service; enabled; vendor preset: enabled)
|
||||
Active: inactive (dead)
|
||||
```
|
||||
|
||||
The service is inactive. Let's see what the status of `example-key.service` is:
|
||||
|
||||
```console
|
||||
$ nixops ssh -d blog-example pa -- systemctl status example-key
|
||||
● example-key.service
|
||||
Loaded: loaded (/nix/store/ikqn64cjq8pspkf3ma1jmx8qzpyrckpb-unit-example-key.service/example-key.service; linked; vendor preset: enabled)
|
||||
Active: activating (start-pre) since Wed 2021-01-20 20:56:05 UTC; 3min 1s ago
|
||||
Cntrl PID: 610 (example-key-pre)
|
||||
IP: 0B in, 0B out
|
||||
IO: 116.0K read, 0B written
|
||||
Tasks: 4 (limit: 2374)
|
||||
Memory: 1.6M
|
||||
CPU: 3ms
|
||||
CGroup: /system.slice/example-key.service
|
||||
├─610 /nix/store/kl6lr3czkbnr6m5crcy8ffwfzbj8a22i-bash-4.4-p23/bin/bash -e /nix/store/awx1zrics3cal8kd9c5d05xzp5ikazlk-unit-script-example-key-pre-start/bin/example-key-pre-start
|
||||
├─619 /nix/store/kl6lr3czkbnr6m5crcy8ffwfzbj8a22i-bash-4.4-p23/bin/bash -e /nix/store/awx1zrics3cal8kd9c5d05xzp5ikazlk-unit-script-example-key-pre-start/bin/example-key-pre-start
|
||||
├─620 /nix/store/kl6lr3czkbnr6m5crcy8ffwfzbj8a22i-bash-4.4-p23/bin/bash -e /nix/store/awx1zrics3cal8kd9c5d05xzp5ikazlk-unit-script-example-key-pre-start/bin/example-key-pre-start
|
||||
└─621 inotifywait -qm --format %f -e create,move /run/keys
|
||||
|
||||
Jan 20 20:56:05 pa systemd[1]: Starting example-key.service...
|
||||
```
|
||||
|
||||
The service is blocked waiting for the keys to exist. We have to populate the
|
||||
keys with `nixops send-keys`:
|
||||
|
||||
```console
|
||||
$ nixops send-keys -d blog-example
|
||||
pa> uploading key ‘example’...
|
||||
```
|
||||
|
||||
Now when we check on `example.service`, we get the following:
|
||||
|
||||
```console
|
||||
$ nixops ssh -d blog-example pa -- systemctl status example
|
||||
● example.service
|
||||
Loaded: loaded (/nix/store/j4a8f6mnaw3v4sz7dqlnz95psh72xglw-unit-example.service/example.service; enabled; vendor preset: enabled)
|
||||
Active: inactive (dead) since Wed 2021-01-20 21:00:24 UTC; 32s ago
|
||||
Process: 954 ExecStart=/nix/store/1yg89z4dsdp1axacqk07iq5jqv58q169-unit-script-example-start/bin/example-start (code=exited, status=0/SUCCESS)
|
||||
Main PID: 954 (code=exited, status=0/SUCCESS)
|
||||
IP: 0B in, 0B out
|
||||
CPU: 3ms
|
||||
|
||||
Jan 20 21:00:24 pa example-start[957]: File: /run/keys/example
|
||||
Jan 20 21:00:24 pa example-start[957]: Size: 31 Blocks: 8 IO Block: 4096 regular file
|
||||
Jan 20 21:00:24 pa example-start[957]: Device: 18h/24d Inode: 27774 Links: 1
|
||||
Jan 20 21:00:24 pa example-start[957]: Access: (0400/-r--------) Uid: ( 998/ example) Gid: ( 96/ keys)
|
||||
Jan 20 21:00:24 pa example-start[957]: Access: 2021-01-20 21:00:24.588494730 +0000
|
||||
Jan 20 21:00:24 pa example-start[957]: Modify: 2021-01-20 21:00:24.588494730 +0000
|
||||
Jan 20 21:00:24 pa example-start[957]: Change: 2021-01-20 21:00:24.606495751 +0000
|
||||
Jan 20 21:00:24 pa example-start[957]: Birth: -
|
||||
Jan 20 21:00:24 pa systemd[1]: example.service: Succeeded.
|
||||
Jan 20 21:00:24 pa systemd[1]: Finished example.service.
|
||||
```
|
||||
|
||||
This means that NixOps secrets require _manual human intervention_ in order to
|
||||
repopulate them on server boot. If your server went offline overnight due to an
|
||||
unexpected issue, your services using those keys could be stuck offline until
|
||||
morning. This is undesirable for a number of reasons. This plus the requirement
|
||||
for the `keys` group (which at time of writing was undocumented) to be added to
|
||||
service user accounts means that while they do work, they are not very
|
||||
ergonomic.
|
||||
|
||||
[You can read secrets from files using something like
|
||||
`deployment.keys.example.text = "${builtins.readFile ./secrets/example.env}"`,
|
||||
but it is kind of a pain to have to do that. It would be better to just
|
||||
reference the secrets by filesystem paths in the first
|
||||
place.](conversation://Mara/hacker)
|
||||
|
||||
On the other hand [Morph](https://github.com/DBCDK/morph) gets this a bit
|
||||
better. It is sadly even less documented than NixOps is, but it offers a similar
|
||||
experience via [deployment
|
||||
secrets](https://github.com/DBCDK/morph/blob/master/examples/secrets.nix). The
|
||||
main differences that Morph brings to the table are taking paths to secrets and
|
||||
allowing you to run an arbitrary command on the secret being uploaded. Secrets
|
||||
are also able to be put anywhere on the disk, meaning that when a host reboots it
|
||||
will come back up with the most recent secrets uploaded to it.
|
||||
|
||||
However, like NixOps, Morph secrets don't have the ability to be rolled back.
|
||||
This means that if you mess up a secret value you better hope you have the old
|
||||
information somewhere. This violates what you'd expect from a NixOS machine.
|
||||
|
||||
So given these examples, I thought it would be interesting to explore what the
|
||||
middle path could look like. I chose to use
|
||||
[age](https://github.com/FiloSottile/age) for encrypting secrets in the Nix
|
||||
store as well as using SSH host keys to ensure that every secret is decryptable
|
||||
at runtime by _that machine only_. If you get your hands on the secret
|
||||
cyphertext, it should be unusable to you.
|
||||
|
||||
One of the harder things here will be keeping a list of all of the server host
|
||||
keys. Recently I added a
|
||||
[hosts.toml](https://github.com/Xe/nixos-configs/blob/master/ops/metadata/hosts.toml)
|
||||
file to my config repo for autoconfiguring my WireGuard overlay network. It was
|
||||
easy enough to add all the SSH host keys for each machine using a command like
|
||||
this to get them:
|
||||
|
||||
[We will cover how this WireGuard overlay works in a future post.](conversation://Mara/hacker)
|
||||
|
||||
```console
|
||||
$ nixops ssh-for-each -d hexagone -- cat /etc/ssh/ssh_host_ed25519_key.pub
|
||||
firgu....> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIB8+mCR+MEsv0XYi7ohvdKLbDecBtb3uKGQOPfIhdj3C root@nixos
|
||||
chrysalis> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGDA5iXvkKyvAiMEd/5IruwKwoymC8WxH4tLcLWOSYJ1 root@chrysalis
|
||||
lufta....> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMADhGV0hKt3ZY+uBjgOXX08txBS6MmHZcSL61KAd3df root@lufta
|
||||
keanu....> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGDZUmuhfjEIROo2hog2c8J53taRuPJLNOtdaT8Nt69W root@nixos
|
||||
```
|
||||
|
||||
age lets you use SSH keys for decryption, so I added these keys to my
|
||||
`hosts.toml` and ended up with something like
|
||||
[this](https://github.com/Xe/nixos-configs/commit/14726e982001e794cd72afa1ece209eed58d3f38#diff-61d1d8dddd71be624c0d718be22072c950ec31c72fded8a25094ea53d94c8185).
|
||||
|
||||
Now we can encrypt secrets on the host machine and safely put them in the Nix
|
||||
store because they will be readable to each target machine with a command like
|
||||
this:
|
||||
|
||||
```shell
|
||||
age -d -i /etc/ssh/ssh_host_ed25519_key -o $dest $src
|
||||
```
|
||||
|
||||
From here it's easy to make a function that we can use for generating new
|
||||
encrypted secrets in the Nix store. First we need to import the host metadata
|
||||
from the toml file:
|
||||
|
||||
```nix
|
||||
let
|
||||
cfg = config.within.secrets;
|
||||
metadata = lib.importTOML ../../ops/metadata/hosts.toml;
|
||||
|
||||
mkSecretOnDisk = name:
|
||||
{ source, ... }:
|
||||
pkgs.stdenv.mkDerivation {
|
||||
name = "${name}-secret";
|
||||
phases = "installPhase";
|
||||
buildInputs = [ pkgs.age ];
|
||||
installPhase =
|
||||
let key = metadata.hosts."${config.networking.hostName}".ssh_pubkey;
|
||||
in ''
|
||||
age -a -r "${key}" -o $out ${source}
|
||||
'';
|
||||
};
|
||||
```
|
||||
|
||||
And then we can generate systemd oneshot jobs with something like this:
|
||||
|
||||
```nix
|
||||
mkService = name:
|
||||
{ source, dest, owner, group, permissions, ... }: {
|
||||
description = "decrypt secret for ${name}";
|
||||
wantedBy = [ "multi-user.target" ];
|
||||
|
||||
serviceConfig.Type = "oneshot";
|
||||
|
||||
script = with pkgs; ''
|
||||
rm -rf ${dest}
|
||||
${age}/bin/age -d -i /etc/ssh/ssh_host_ed25519_key -o ${dest} ${
|
||||
mkSecretOnDisk name { inherit source; }
|
||||
}
|
||||
|
||||
chown ${owner}:${group} ${dest}
|
||||
chmod ${permissions} ${dest}
|
||||
'';
|
||||
};
|
||||
```
|
||||
|
||||
And from there we just need some [boring
|
||||
boilerplate](https://github.com/Xe/nixos-configs/blob/master/common/crypto/default.nix#L8-L38)
|
||||
to define a secret type. Then we declare the secret type and its invocation:
|
||||
|
||||
```nix
|
||||
in {
|
||||
options.within.secrets = mkOption {
|
||||
type = types.attrsOf secret;
|
||||
description = "secret configuration";
|
||||
default = { };
|
||||
};
|
||||
|
||||
config.systemd.services = let
|
||||
units = mapAttrs' (name: info: {
|
||||
name = "${name}-key";
|
||||
value = (mkService name info);
|
||||
}) cfg;
|
||||
in units;
|
||||
}
|
||||
```
|
||||
|
||||
And we have ourself a NixOS module that allows us to:
|
||||
|
||||
* Trivially declare new secrets
|
||||
* Make secrets in the Nix store useless without the key
|
||||
* Make every secret be transparently decrypted on startup
|
||||
* Avoid the use of GPG
|
||||
* Roll back secrets like any other configuration change
|
||||
|
||||
Declaring new secrets works like this (as stolen from [the service definition
|
||||
for the website you are reading right now](https://github.com/Xe/nixos-configs/blob/master/common/services/xesite.nix#L35-L41)):
|
||||
|
||||
```nix
|
||||
within.secrets.example = {
|
||||
source = ./secrets/example.env;
|
||||
dest = "/var/lib/example/.env";
|
||||
owner = "example";
|
||||
group = "nogroup";
|
||||
permissions = "0400";
|
||||
};
|
||||
```
|
||||
|
||||
Barring some kind of cryptographic attack against age, this should allow the
|
||||
secrets to be stored securely. I am working on a way to make this more generic.
|
||||
This overall approach was inspired by [agenix](https://github.com/ryantm/agenix)
|
||||
but made more specific for my needs. I hope this approach will make it easy for
|
||||
me to manage these secrets in the future.
|
Loading…
Reference in New Issue