14 KiB
title | date | series | tags | ||
---|---|---|---|---|---|
Encrypted Secrets with NixOS | 2021-01-20 | nixos |
|
Encrypted Secrets with NixOS
One of the best things about NixOS is the fact that it's so easy to do configuration management using it. The Nix store (where all your packages live) has a huge flaw for secret management though: everything in the Nix store is globally readable. This means that anyone logged into or running code on the system could read any secret in the Nix store without any limits. This is sub-optimal if your goal is to keep secret values secret. There have been a few approaches to this over the years, but I want to describe how I'm doing it. Here are my goals and implementation for this setup and how a few other secret management strategies don't quite pan out.
At a high level I have these goals:
- It should be trivial to declare new secrets
- Secrets should never be globally readable in any useful form
- If I restart the machine, I should not need to take manual human action to ensure all of the services come back online
- GPG should be avoided at all costs
As a side goal being able to roll back secret changes would also be nice.
The two biggest tools that offer a way to help with secret management on NixOS that come to mind are NixOps and Morph.
NixOps is a tool that helps administrators operate NixOS across multiple servers at once. I use NixOps extensively in my own setup. It calls deployment secrets "keys" and they are documented here. At a high level they are declared like this:
deployment.keys.example = {
text = "this is a super sekrit value :)";
user = "example";
group = "keys";
permissions = "0400";
};
This will create a new secret in /run/keys
that will contain our super secret
value.
Wait, isn't /run
an ephemeral filesystem? What happens when the system
reboots?
Let's make an example system and find out! So let's say we have that example
secret from earlier and want to use it in a job. The job definition could look
something like this:
# create a service-specific user
users.users.example.isSystemUser = true;
# without this group the secret can't be read
users.users.example.extraGroups = [ "keys" ];
systemd.services.example = {
wantedBy = [ "multi-user.target" ];
after = [ "example-key.service" ];
wants = [ "example-key.service" ];
serviceConfig.User = "example";
serviceConfig.Type = "oneshot";
script = ''
stat /run/keys/example
'';
};
This creates a user called example
and gives it permission to read deployment
keys. It also creates a systemd service called example.service
and runs
id(1)
stat(1)
to show the permissions of the
service and the key file. It also runs as our example
user. To avoid systemd
thinking our service failed, we're also going to mark it as a
oneshot.
Altogether it could look something like
this. Let's see
what systemctl
has to report:
$ nixops ssh -d blog-example pa -- systemctl status example
● example.service
Loaded: loaded (/nix/store/j4a8f6mnaw3v4sz7dqlnz95psh72xglw-unit-example.service/example.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Wed 2021-01-20 20:53:54 UTC; 37s ago
Process: 2230 ExecStart=/nix/store/1yg89z4dsdp1axacqk07iq5jqv58q169-unit-script-example-start/bin/example-start (code=exited, status=0/SUCCESS)
Main PID: 2230 (code=exited, status=0/SUCCESS)
IP: 0B in, 0B out
CPU: 3ms
Jan 20 20:53:54 pa example-start[2235]: File: /run/keys/example
Jan 20 20:53:54 pa example-start[2235]: Size: 31 Blocks: 8 IO Block: 4096 regular file
Jan 20 20:53:54 pa example-start[2235]: Device: 18h/24d Inode: 37428 Links: 1
Jan 20 20:53:54 pa example-start[2235]: Access: (0400/-r--------) Uid: ( 998/ example) Gid: ( 96/ keys)
Jan 20 20:53:54 pa example-start[2235]: Access: 2021-01-20 20:53:54.010554201 +0000
Jan 20 20:53:54 pa example-start[2235]: Modify: 2021-01-20 20:53:54.010554201 +0000
Jan 20 20:53:54 pa example-start[2235]: Change: 2021-01-20 20:53:54.398103181 +0000
Jan 20 20:53:54 pa example-start[2235]: Birth: -
Jan 20 20:53:54 pa systemd[1]: example.service: Succeeded.
Jan 20 20:53:54 pa systemd[1]: Finished example.service.
So what happens when we reboot? I'll force a reboot in my hypervisor and we'll find out:
$ nixops ssh -d blog-example pa -- systemctl status example
● example.service
Loaded: loaded (/nix/store/j4a8f6mnaw3v4sz7dqlnz95psh72xglw-unit-example.service/example.service; enabled; vendor preset: enabled)
Active: inactive (dead)
The service is inactive. Let's see what the status of example-key.service
is:
$ nixops ssh -d blog-example pa -- systemctl status example-key
● example-key.service
Loaded: loaded (/nix/store/ikqn64cjq8pspkf3ma1jmx8qzpyrckpb-unit-example-key.service/example-key.service; linked; vendor preset: enabled)
Active: activating (start-pre) since Wed 2021-01-20 20:56:05 UTC; 3min 1s ago
Cntrl PID: 610 (example-key-pre)
IP: 0B in, 0B out
IO: 116.0K read, 0B written
Tasks: 4 (limit: 2374)
Memory: 1.6M
CPU: 3ms
CGroup: /system.slice/example-key.service
├─610 /nix/store/kl6lr3czkbnr6m5crcy8ffwfzbj8a22i-bash-4.4-p23/bin/bash -e /nix/store/awx1zrics3cal8kd9c5d05xzp5ikazlk-unit-script-example-key-pre-start/bin/example-key-pre-start
├─619 /nix/store/kl6lr3czkbnr6m5crcy8ffwfzbj8a22i-bash-4.4-p23/bin/bash -e /nix/store/awx1zrics3cal8kd9c5d05xzp5ikazlk-unit-script-example-key-pre-start/bin/example-key-pre-start
├─620 /nix/store/kl6lr3czkbnr6m5crcy8ffwfzbj8a22i-bash-4.4-p23/bin/bash -e /nix/store/awx1zrics3cal8kd9c5d05xzp5ikazlk-unit-script-example-key-pre-start/bin/example-key-pre-start
└─621 inotifywait -qm --format %f -e create,move /run/keys
Jan 20 20:56:05 pa systemd[1]: Starting example-key.service...
The service is blocked waiting for the keys to exist. We have to populate the
keys with nixops send-keys
:
$ nixops send-keys -d blog-example
pa> uploading key ‘example’...
Now when we check on example.service
, we get the following:
$ nixops ssh -d blog-example pa -- systemctl status example
● example.service
Loaded: loaded (/nix/store/j4a8f6mnaw3v4sz7dqlnz95psh72xglw-unit-example.service/example.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Wed 2021-01-20 21:00:24 UTC; 32s ago
Process: 954 ExecStart=/nix/store/1yg89z4dsdp1axacqk07iq5jqv58q169-unit-script-example-start/bin/example-start (code=exited, status=0/SUCCESS)
Main PID: 954 (code=exited, status=0/SUCCESS)
IP: 0B in, 0B out
CPU: 3ms
Jan 20 21:00:24 pa example-start[957]: File: /run/keys/example
Jan 20 21:00:24 pa example-start[957]: Size: 31 Blocks: 8 IO Block: 4096 regular file
Jan 20 21:00:24 pa example-start[957]: Device: 18h/24d Inode: 27774 Links: 1
Jan 20 21:00:24 pa example-start[957]: Access: (0400/-r--------) Uid: ( 998/ example) Gid: ( 96/ keys)
Jan 20 21:00:24 pa example-start[957]: Access: 2021-01-20 21:00:24.588494730 +0000
Jan 20 21:00:24 pa example-start[957]: Modify: 2021-01-20 21:00:24.588494730 +0000
Jan 20 21:00:24 pa example-start[957]: Change: 2021-01-20 21:00:24.606495751 +0000
Jan 20 21:00:24 pa example-start[957]: Birth: -
Jan 20 21:00:24 pa systemd[1]: example.service: Succeeded.
Jan 20 21:00:24 pa systemd[1]: Finished example.service.
This means that NixOps secrets require manual human intervention in order to
repopulate them on server boot. If your server went offline overnight due to an
unexpected issue, your services using those keys could be stuck offline until
morning. This is undesirable for a number of reasons. This plus the requirement
for the keys
group (which at time of writing was undocumented) to be added to
service user accounts means that while they do work, they are not very
ergonomic.
On the other hand Morph gets this a bit better. It is sadly even less documented than NixOps is, but it offers a similar experience via deployment secrets. The main differences that Morph brings to the table are taking paths to secrets and allowing you to run an arbitrary command on the secret being uploaded. Secrets are also able to be put anywhere on the disk, meaning that when a host reboots it will come back up with the most recent secrets uploaded to it.
However, like NixOps, Morph secrets don't have the ability to be rolled back. This means that if you mess up a secret value you better hope you have the old information somewhere. This violates what you'd expect from a NixOS machine.
So given these examples, I thought it would be interesting to explore what the middle path could look like. I chose to use age for encrypting secrets in the Nix store as well as using SSH host keys to ensure that every secret is decryptable at runtime by that machine only. If you get your hands on the secret cyphertext, it should be unusable to you.
One of the harder things here will be keeping a list of all of the server host keys. Recently I added a hosts.toml file to my config repo for autoconfiguring my WireGuard overlay network. It was easy enough to add all the SSH host keys for each machine using a command like this to get them:
We will cover how this WireGuard overlay works in a future post.
$ nixops ssh-for-each -d hexagone -- cat /etc/ssh/ssh_host_ed25519_key.pub
firgu....> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIB8+mCR+MEsv0XYi7ohvdKLbDecBtb3uKGQOPfIhdj3C root@nixos
chrysalis> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGDA5iXvkKyvAiMEd/5IruwKwoymC8WxH4tLcLWOSYJ1 root@chrysalis
lufta....> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMADhGV0hKt3ZY+uBjgOXX08txBS6MmHZcSL61KAd3df root@lufta
keanu....> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGDZUmuhfjEIROo2hog2c8J53taRuPJLNOtdaT8Nt69W root@nixos
age lets you use SSH keys for decryption, so I added these keys to my
hosts.toml
and ended up with something like
this.
Now we can encrypt secrets on the host machine and safely put them in the Nix store because they will be readable to each target machine with a command like this:
age -d -i /etc/ssh/ssh_host_ed25519_key -o $dest $src
From here it's easy to make a function that we can use for generating new encrypted secrets in the Nix store. First we need to import the host metadata from the toml file:
let
cfg = config.within.secrets;
metadata = lib.importTOML ../../ops/metadata/hosts.toml;
mkSecretOnDisk = name:
{ source, ... }:
pkgs.stdenv.mkDerivation {
name = "${name}-secret";
phases = "installPhase";
buildInputs = [ pkgs.age ];
installPhase =
let key = metadata.hosts."${config.networking.hostName}".ssh_pubkey;
in ''
age -a -r "${key}" -o $out ${source}
'';
};
And then we can generate systemd oneshot jobs with something like this:
mkService = name:
{ source, dest, owner, group, permissions, ... }: {
description = "decrypt secret for ${name}";
wantedBy = [ "multi-user.target" ];
serviceConfig.Type = "oneshot";
script = with pkgs; ''
rm -rf ${dest}
${age}/bin/age -d -i /etc/ssh/ssh_host_ed25519_key -o ${dest} ${
mkSecretOnDisk name { inherit source; }
}
chown ${owner}:${group} ${dest}
chmod ${permissions} ${dest}
'';
};
And from there we just need some boring boilerplate to define a secret type. Then we declare the secret type and its invocation:
in {
options.within.secrets = mkOption {
type = types.attrsOf secret;
description = "secret configuration";
default = { };
};
config.systemd.services = let
units = mapAttrs' (name: info: {
name = "${name}-key";
value = (mkService name info);
}) cfg;
in units;
}
And we have ourself a NixOS module that allows us to:
- Trivially declare new secrets
- Make secrets in the Nix store useless without the key
- Make every secret be transparently decrypted on startup
- Avoid the use of GPG
- Roll back secrets like any other configuration change
Declaring new secrets works like this (as stolen from the service definition for the website you are reading right now):
within.secrets.example = {
source = ./secrets/example.env;
dest = "/var/lib/example/.env";
owner = "example";
group = "nogroup";
permissions = "0400";
};
Barring some kind of cryptographic attack against age, this should allow the secrets to be stored securely. I am working on a way to make this more generic. This overall approach was inspired by agenix but made more specific for my needs. I hope this approach will make it easy for me to manage these secrets in the future.