</kubernetes>
Signed-off-by: Christine Dodrill <me@christine.website>
This commit is contained in:
parent
951542ccf2
commit
1ae1cc2945
|
@ -0,0 +1,229 @@
|
|||
---
|
||||
title: "</kubernetes>"
|
||||
date: 2021-01-03
|
||||
---
|
||||
|
||||
# </kubernetes>
|
||||
|
||||
Well, since I posted [that last post](/blog/k8s-pondering-2020-12-31) I have had
|
||||
an adventure. A good friend pointed out a server host that I had missed when I
|
||||
was looking for other places to use, and now I have migrated my blog to this new
|
||||
server. As of yesterday, I now run my website on a dedicated server in the
|
||||
Netherlands. Here is the story of my journey to migrate 6 years of cruft and
|
||||
technical debt to this new server.
|
||||
|
||||
Let's talk about this goliath of a server. This server is an AX41 from Hetzner.
|
||||
It has 64 GB of ram, a 512 GB nvme drive, 3 2 TB drives, and a Ryzen 3600. For
|
||||
all practical concerns, this beast is beyond overkill and rivals my workstation
|
||||
tower in everything but the GPU power. I have named it `lufta`, which is the
|
||||
word for feather in [L'ewa](https://lewa.within.website/dictionary.html).
|
||||
|
||||
## Assimilation
|
||||
|
||||
For my server setup process, the first step it to assimilate it. In this step I
|
||||
get a base NixOS install on it somehow. Since I was using Hetzner, I was able to
|
||||
boot into a NixOS install image using the process documented
|
||||
[here](https://nixos.wiki/wiki/Install_NixOS_on_Hetzner_Online). Then I decided
|
||||
that it would also be cool to have this server use
|
||||
[zfs](https://en.wikipedia.org/wiki/ZFS) as its filesystem to take advantage of
|
||||
its legendary subvolume and snapshotting features.
|
||||
|
||||
So I wrote up a bootstrap system definition like the Hetzner tutorial said and
|
||||
ended up with `hosts/lufta/bootstrap.nix`:
|
||||
|
||||
```nix
|
||||
{ pkgs, ... }:
|
||||
|
||||
{
|
||||
services.openssh.enable = true;
|
||||
users.users.root.openssh.authorizedKeys.keys = [
|
||||
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPg9gYKVglnO2HQodSJt4z4mNrUSUiyJQ7b+J798bwD9 cadey@shachi"
|
||||
];
|
||||
|
||||
networking.usePredictableInterfaceNames = false;
|
||||
systemd.network = {
|
||||
enable = true;
|
||||
networks."eth0".extraConfig = ''
|
||||
[Match]
|
||||
Name = eth0
|
||||
[Network]
|
||||
# Add your own assigned ipv6 subnet here here!
|
||||
Address = 2a01:4f9:3a:1a1c::/64
|
||||
Gateway = fe80::1
|
||||
# optionally you can do the same for ipv4 and disable DHCP (networking.dhcpcd.enable = false;)
|
||||
Address = 135.181.162.99/26
|
||||
Gateway = 135.181.162.65
|
||||
'';
|
||||
};
|
||||
|
||||
boot.supportedFilesystems = [ "zfs" ];
|
||||
|
||||
environment.systemPackages = with pkgs; [ wget vim zfs ];
|
||||
}
|
||||
```
|
||||
|
||||
Then I fired up the kexec tarball and waited for the server to boot into a NixOS
|
||||
live environment. A few minutes later I was in. I started formatting the drives
|
||||
according to the [NixOS install
|
||||
guide](https://nixos.org/manual/nixos/stable/index.html#sec-installation) with
|
||||
one major difference: I added a `/boot` ext4 partition on the SSD. This allows
|
||||
me to have the system root device on zfs. I added the disks to a `raidz1` pool
|
||||
and created a few volumes. I also added the SSD as a log device so I get SSD
|
||||
caching.
|
||||
|
||||
From there I installed NixOS as normal and rebooted the server. It booted
|
||||
normally. I had a shiny new NixOS server in the cloud! I noticed that the server
|
||||
had booted into NixOS unstable as opposed to NixOS 20.09 like my other nodes. I
|
||||
thought "ah, well, that probably isn't a problem" and continued to the
|
||||
configuration step.
|
||||
|
||||
[That's ominous...](conversation://Mara/hmm)
|
||||
|
||||
## Configuration
|
||||
|
||||
Now that the server was assimilated and I could SSH into it, the next step was
|
||||
to configure it to run my services. While I was waiting for Hetzner to provision
|
||||
my server I ported a bunch of my services over to Nixops services [a-la this
|
||||
post](/blog/nixops-services-2020-11-09) in [this
|
||||
folder](https://github.com/Xe/nixos-configs/tree/master/common/services) of my
|
||||
configs repo.
|
||||
|
||||
Now that I had them, it was time to add this server to my Nixops setup. So I
|
||||
opened the [nixops definition
|
||||
folder](https://github.com/Xe/nixos-configs/tree/master/nixops/hexagone) and
|
||||
added the metadata for `lufta`. Then I added it to my Nixops deployment with
|
||||
this command:
|
||||
|
||||
```console
|
||||
$ nixops modify -d hexagone -n hexagone *.nix
|
||||
```
|
||||
|
||||
Then I copied over the autogenerated config from `lufta`'s `/etc/nixos/` folder
|
||||
into
|
||||
[`hosts/lufta`](https://github.com/Xe/nixos-configs/tree/master/hosts/lufta) and
|
||||
ran a `nixops deploy` to add some other base configuration.
|
||||
|
||||
## Migration
|
||||
|
||||
Once that was done, I started enabling my services and pushing configs to test
|
||||
them. After I got to a point where I thought things would work I opened up the
|
||||
Kubernetes console and started deleting deployments on my kubernetes cluster as
|
||||
I felt "safe" to migrate them over. Then I saw the deployments come back. I
|
||||
deleted them again and they came back again.
|
||||
|
||||
Oh, right. I enabled that one Kubernetes service that made it intentionally hard
|
||||
to delete deployments. One clever set of scale-downs and kills later and I was
|
||||
able to kill things with wild abandon.
|
||||
|
||||
I copied over the gitea data with `rsync` running in the kubernetes deployment.
|
||||
Then I killed the gitea deployment, updated DNS and reran a whole bunch of gitea
|
||||
jobs to resanify the environment. I did a test clone on a few of my repos and
|
||||
then I deleted the gitea volume from DigitalOcean.
|
||||
|
||||
Moving over the other deployments from Kubernetes into NixOS services was
|
||||
somewhat easy, however I did need to repackage a bunch of my programs and static
|
||||
sites for NixOS. I made the
|
||||
[`pkgs`](https://github.com/Xe/nixos-configs/tree/master/pkgs) tree a bit more
|
||||
fleshed out to compensate.
|
||||
|
||||
[Okay, packaging static sites in NixOS is beyond overkill, however a lot of them
|
||||
need some annoyingly complicated build steps and throwing it all into Nix means
|
||||
that we can make them reproducible and use one build system to rule them
|
||||
all. Not to mention that when I need to upgrade the system, everything will
|
||||
rebuild with new system libraries to avoid the <a
|
||||
href="https://blog.tidelift.com/bit-rot-the-silent-killer">Docker bitrot
|
||||
problem</a>.](conversation://Mara/hacker)
|
||||
|
||||
## Reboot Test
|
||||
|
||||
After a significant portion of the services were moved over, I decided it was
|
||||
time to do the reboot test. I ran the `reboot` command and then...nothing.
|
||||
My continuous ping test was timing out. My phone was blowing up with downtime
|
||||
messages from NodePing. Yep, I messed something up.
|
||||
|
||||
I was able to boot the server back into a NixOS recovery environment using the
|
||||
kexec trick, and from there I was able to prove the following:
|
||||
|
||||
- The zfs setup is healthy
|
||||
- I can read some of the data I migrated over
|
||||
- I can unmount and remount the ZFS volumes repeatedly
|
||||
|
||||
I was confused. This shouldn't be happening. After half an hour of
|
||||
troubleshooting, I gave in and ordered an IPKVM to be installed in my server.
|
||||
|
||||
Once that was set up (and I managed to trick MacOS into letting me boot a .jnlp
|
||||
web start file), I rebooted the server so I could see what error I was getting
|
||||
on boot. I missed it the first time around, but on the second time I was able to
|
||||
capture this screenshot:
|
||||
|
||||
![The error I was looking
|
||||
for](https://cdn.christine.website/file/christine-static/blog/Screen+Shot+2021-01-03+at+1.13.05+AM.png)
|
||||
|
||||
Then it hit me. I did the install on NixOS unstable. My other servers use NixOS
|
||||
20.09. I had downgraded zfs and the older version of zfs couldn't mount the
|
||||
volume created by the newer version of zfs in read/write mode. One more trip to
|
||||
the recovery environment later to install NixOS unstable in a new generation.
|
||||
|
||||
Then I switched my tower's default NixOS channel to the unstable channel and ran
|
||||
`nixops deploy` to reactivate my services. After the NodePing uptime
|
||||
notifications came in, I ran the reboot test again while looking at the console
|
||||
output to be sure.
|
||||
|
||||
It booted. It worked. I had a stable setup. Then I reconnected to IRC and passed
|
||||
out.
|
||||
|
||||
## Services Migrated
|
||||
|
||||
Here is a list of all of the services I have migrated over from my old dedicated
|
||||
server, my kubernetes cluster and my dokku server:
|
||||
|
||||
- aerial -> discord chatbot
|
||||
- goproxy -> go modules proxy
|
||||
- lewa -> https://lewa.within.website
|
||||
- hlang -> https://h.christine.website
|
||||
- mi -> https://mi.within.website
|
||||
- printerfacts -> https://printerfacts.cetacean.club
|
||||
- xesite -> https://christine.website
|
||||
- graphviz -> https://graphviz.christine.website
|
||||
- idp -> https://idp.christine.website
|
||||
- oragono -> ircs://irc.within.website:6697/
|
||||
- tron -> discord bot
|
||||
- withinbot -> discord bot
|
||||
- withinwebsite -> https://within.website
|
||||
- gitea -> https://tulpa.dev
|
||||
- other static sites
|
||||
|
||||
Doing this migration is a bit of an archaeology project as well. I was
|
||||
continuously discovering services that I had littered over my machines with very
|
||||
poorly documented requirements and configuration. I hope that this move will let
|
||||
the next time I do this kind of migration be a lot easier by comparison.
|
||||
|
||||
I still have a few other services to move over, however the ones that are left
|
||||
are much more annoying to set up properly. I'm going to get to deprovision 5
|
||||
servers in this migration and as a result get this stupidly powerful goliath of
|
||||
a server to do whatever I want with and I also get to cut my monthly server
|
||||
costs by over half.
|
||||
|
||||
I am very close to being able to turn off the Kubernetes cluster and use NixOS
|
||||
for everything. A few services that are still on the Kubernetes cluster are
|
||||
resistant to being nixified, so I may have to use the Docker containers for
|
||||
that. I was hoping to be able to cut out Docker entirely, however we don't seem
|
||||
to be that lucky yet.
|
||||
|
||||
Sure, there is some added latency with the server being in Europe instead of
|
||||
Montreal, however if this ever becomes a practical issue I can always launch a
|
||||
cheap DigitalOcean VPS in Toronto to act as a DNS server for my WireGuard setup.
|
||||
|
||||
Either way, I am now off Kubernetes for my highest traffic services. If services
|
||||
of mine need to use the disk, they can now just use the disk. If I really care
|
||||
about the data, I can add the service folders to the list of paths to back up to
|
||||
`rsync.net` (I have a post about how this backup process works in the drafting
|
||||
stage) via [borgbackup](https://www.borgbackup.org/).
|
||||
|
||||
Let's hope it stays online!
|
||||
|
||||
---
|
||||
|
||||
Many thanks to [Graham Christensen](https://twitter.com/grhmc), [Dave
|
||||
Anderson](https://twitter.com/dave_universetf) and everyone else who has been
|
||||
helping me along this journey. I would be lost without them.
|
|
@ -9,6 +9,13 @@ fn main() -> Result<()> {
|
|||
.output()
|
||||
.unwrap();
|
||||
let git_hash = String::from_utf8(output.stdout).unwrap();
|
||||
println!("cargo:rustc-env=GITHUB_SHA={}", git_hash);
|
||||
println!(
|
||||
"cargo:rustc-env=GITHUB_SHA={}",
|
||||
if git_hash.as_str() == "" {
|
||||
env!("out").into()
|
||||
} else {
|
||||
git_hash
|
||||
}
|
||||
);
|
||||
Ok(())
|
||||
}
|
||||
|
|
|
@ -7,7 +7,7 @@
|
|||
<blockquote>Copyright 2020 Christine Dodrill. Any and all opinions listed here are my own and not representative of my employers; future, past and present.</blockquote>
|
||||
<!--<p>Like what you see? Donate on <a href="https://www.patreon.com/cadey">Patreon</a> like <a href="/patrons">these awesome people</a>!</p>-->
|
||||
<p>Looking for someone for your team? Take a look <a href="/signalboost">here</a>.</p>
|
||||
<p>Served by @APP running commit <a href="https://github.com/Xe/site/commit/@env!("GITHUB_SHA")">@env!("GITHUB_SHA")</a>, see <a href="https://github.com/Xe/site">source code here</a>.</p>
|
||||
<p>Served by @APP running @env!("out")/bin/xesite</a>, see <a href="https://github.com/Xe/site">source code here</a>.</p>
|
||||
</footer>
|
||||
|
||||
</div>
|
||||
|
|
Loading…
Reference in New Issue