diff --git a/blog/backslash-kubernetes-2021-01-03.markdown b/blog/backslash-kubernetes-2021-01-03.markdown new file mode 100644 index 0000000..d2d0ed9 --- /dev/null +++ b/blog/backslash-kubernetes-2021-01-03.markdown @@ -0,0 +1,229 @@ +--- +title: "" +date: 2021-01-03 +--- + +# </kubernetes> + +Well, since I posted [that last post](/blog/k8s-pondering-2020-12-31) I have had +an adventure. A good friend pointed out a server host that I had missed when I +was looking for other places to use, and now I have migrated my blog to this new +server. As of yesterday, I now run my website on a dedicated server in the +Netherlands. Here is the story of my journey to migrate 6 years of cruft and +technical debt to this new server. + +Let's talk about this goliath of a server. This server is an AX41 from Hetzner. +It has 64 GB of ram, a 512 GB nvme drive, 3 2 TB drives, and a Ryzen 3600. For +all practical concerns, this beast is beyond overkill and rivals my workstation +tower in everything but the GPU power. I have named it `lufta`, which is the +word for feather in [L'ewa](https://lewa.within.website/dictionary.html). + +## Assimilation + +For my server setup process, the first step it to assimilate it. In this step I +get a base NixOS install on it somehow. Since I was using Hetzner, I was able to +boot into a NixOS install image using the process documented +[here](https://nixos.wiki/wiki/Install_NixOS_on_Hetzner_Online). Then I decided +that it would also be cool to have this server use +[zfs](https://en.wikipedia.org/wiki/ZFS) as its filesystem to take advantage of +its legendary subvolume and snapshotting features. + +So I wrote up a bootstrap system definition like the Hetzner tutorial said and +ended up with `hosts/lufta/bootstrap.nix`: + +```nix +{ pkgs, ... }: + +{ + services.openssh.enable = true; + users.users.root.openssh.authorizedKeys.keys = [ + "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPg9gYKVglnO2HQodSJt4z4mNrUSUiyJQ7b+J798bwD9 cadey@shachi" + ]; + + networking.usePredictableInterfaceNames = false; + systemd.network = { + enable = true; + networks."eth0".extraConfig = '' + [Match] + Name = eth0 + [Network] + # Add your own assigned ipv6 subnet here here! + Address = 2a01:4f9:3a:1a1c::/64 + Gateway = fe80::1 + # optionally you can do the same for ipv4 and disable DHCP (networking.dhcpcd.enable = false;) + Address = 135.181.162.99/26 + Gateway = 135.181.162.65 + ''; + }; + + boot.supportedFilesystems = [ "zfs" ]; + + environment.systemPackages = with pkgs; [ wget vim zfs ]; +} +``` + +Then I fired up the kexec tarball and waited for the server to boot into a NixOS +live environment. A few minutes later I was in. I started formatting the drives +according to the [NixOS install +guide](https://nixos.org/manual/nixos/stable/index.html#sec-installation) with +one major difference: I added a `/boot` ext4 partition on the SSD. This allows +me to have the system root device on zfs. I added the disks to a `raidz1` pool +and created a few volumes. I also added the SSD as a log device so I get SSD +caching. + +From there I installed NixOS as normal and rebooted the server. It booted +normally. I had a shiny new NixOS server in the cloud! I noticed that the server +had booted into NixOS unstable as opposed to NixOS 20.09 like my other nodes. I +thought "ah, well, that probably isn't a problem" and continued to the +configuration step. + +[That's ominous...](conversation://Mara/hmm) + +## Configuration + +Now that the server was assimilated and I could SSH into it, the next step was +to configure it to run my services. While I was waiting for Hetzner to provision +my server I ported a bunch of my services over to Nixops services [a-la this +post](/blog/nixops-services-2020-11-09) in [this +folder](https://github.com/Xe/nixos-configs/tree/master/common/services) of my +configs repo. + +Now that I had them, it was time to add this server to my Nixops setup. So I +opened the [nixops definition +folder](https://github.com/Xe/nixos-configs/tree/master/nixops/hexagone) and +added the metadata for `lufta`. Then I added it to my Nixops deployment with +this command: + +```console +$ nixops modify -d hexagone -n hexagone *.nix +``` + +Then I copied over the autogenerated config from `lufta`'s `/etc/nixos/` folder +into +[`hosts/lufta`](https://github.com/Xe/nixos-configs/tree/master/hosts/lufta) and +ran a `nixops deploy` to add some other base configuration. + +## Migration + +Once that was done, I started enabling my services and pushing configs to test +them. After I got to a point where I thought things would work I opened up the +Kubernetes console and started deleting deployments on my kubernetes cluster as +I felt "safe" to migrate them over. Then I saw the deployments come back. I +deleted them again and they came back again. + +Oh, right. I enabled that one Kubernetes service that made it intentionally hard +to delete deployments. One clever set of scale-downs and kills later and I was +able to kill things with wild abandon. + +I copied over the gitea data with `rsync` running in the kubernetes deployment. +Then I killed the gitea deployment, updated DNS and reran a whole bunch of gitea +jobs to resanify the environment. I did a test clone on a few of my repos and +then I deleted the gitea volume from DigitalOcean. + +Moving over the other deployments from Kubernetes into NixOS services was +somewhat easy, however I did need to repackage a bunch of my programs and static +sites for NixOS. I made the +[`pkgs`](https://github.com/Xe/nixos-configs/tree/master/pkgs) tree a bit more +fleshed out to compensate. + +[Okay, packaging static sites in NixOS is beyond overkill, however a lot of them +need some annoyingly complicated build steps and throwing it all into Nix means +that we can make them reproducible and use one build system to rule them +all. Not to mention that when I need to upgrade the system, everything will +rebuild with new system libraries to avoid the Docker bitrot +problem.](conversation://Mara/hacker) + +## Reboot Test + +After a significant portion of the services were moved over, I decided it was +time to do the reboot test. I ran the `reboot` command and then...nothing. +My continuous ping test was timing out. My phone was blowing up with downtime +messages from NodePing. Yep, I messed something up. + +I was able to boot the server back into a NixOS recovery environment using the +kexec trick, and from there I was able to prove the following: + +- The zfs setup is healthy +- I can read some of the data I migrated over +- I can unmount and remount the ZFS volumes repeatedly + +I was confused. This shouldn't be happening. After half an hour of +troubleshooting, I gave in and ordered an IPKVM to be installed in my server. + +Once that was set up (and I managed to trick MacOS into letting me boot a .jnlp +web start file), I rebooted the server so I could see what error I was getting +on boot. I missed it the first time around, but on the second time I was able to +capture this screenshot: + +![The error I was looking +for](https://cdn.christine.website/file/christine-static/blog/Screen+Shot+2021-01-03+at+1.13.05+AM.png) + +Then it hit me. I did the install on NixOS unstable. My other servers use NixOS +20.09. I had downgraded zfs and the older version of zfs couldn't mount the +volume created by the newer version of zfs in read/write mode. One more trip to +the recovery environment later to install NixOS unstable in a new generation. + +Then I switched my tower's default NixOS channel to the unstable channel and ran +`nixops deploy` to reactivate my services. After the NodePing uptime +notifications came in, I ran the reboot test again while looking at the console +output to be sure. + +It booted. It worked. I had a stable setup. Then I reconnected to IRC and passed +out. + +## Services Migrated + +Here is a list of all of the services I have migrated over from my old dedicated +server, my kubernetes cluster and my dokku server: + +- aerial -> discord chatbot +- goproxy -> go modules proxy +- lewa -> https://lewa.within.website +- hlang -> https://h.christine.website +- mi -> https://mi.within.website +- printerfacts -> https://printerfacts.cetacean.club +- xesite -> https://christine.website +- graphviz -> https://graphviz.christine.website +- idp -> https://idp.christine.website +- oragono -> ircs://irc.within.website:6697/ +- tron -> discord bot +- withinbot -> discord bot +- withinwebsite -> https://within.website +- gitea -> https://tulpa.dev +- other static sites + +Doing this migration is a bit of an archaeology project as well. I was +continuously discovering services that I had littered over my machines with very +poorly documented requirements and configuration. I hope that this move will let +the next time I do this kind of migration be a lot easier by comparison. + +I still have a few other services to move over, however the ones that are left +are much more annoying to set up properly. I'm going to get to deprovision 5 +servers in this migration and as a result get this stupidly powerful goliath of +a server to do whatever I want with and I also get to cut my monthly server +costs by over half. + +I am very close to being able to turn off the Kubernetes cluster and use NixOS +for everything. A few services that are still on the Kubernetes cluster are +resistant to being nixified, so I may have to use the Docker containers for +that. I was hoping to be able to cut out Docker entirely, however we don't seem +to be that lucky yet. + +Sure, there is some added latency with the server being in Europe instead of +Montreal, however if this ever becomes a practical issue I can always launch a +cheap DigitalOcean VPS in Toronto to act as a DNS server for my WireGuard setup. + +Either way, I am now off Kubernetes for my highest traffic services. If services +of mine need to use the disk, they can now just use the disk. If I really care +about the data, I can add the service folders to the list of paths to back up to +`rsync.net` (I have a post about how this backup process works in the drafting +stage) via [borgbackup](https://www.borgbackup.org/). + +Let's hope it stays online! + +--- + +Many thanks to [Graham Christensen](https://twitter.com/grhmc), [Dave +Anderson](https://twitter.com/dave_universetf) and everyone else who has been +helping me along this journey. I would be lost without them. diff --git a/src/build.rs b/src/build.rs index 0d8d5a5..600de8a 100644 --- a/src/build.rs +++ b/src/build.rs @@ -9,6 +9,13 @@ fn main() -> Result<()> { .output() .unwrap(); let git_hash = String::from_utf8(output.stdout).unwrap(); - println!("cargo:rustc-env=GITHUB_SHA={}", git_hash); + println!( + "cargo:rustc-env=GITHUB_SHA={}", + if git_hash.as_str() == "" { + env!("out").into() + } else { + git_hash + } + ); Ok(()) } diff --git a/templates/footer.rs.html b/templates/footer.rs.html index 7f46c49..a7540e8 100644 --- a/templates/footer.rs.html +++ b/templates/footer.rs.html @@ -7,7 +7,7 @@
Copyright 2020 Christine Dodrill. Any and all opinions listed here are my own and not representative of my employers; future, past and present.
Looking for someone for your team? Take a look here.
-Served by @APP running commit @env!("GITHUB_SHA"), see source code here.
+Served by @APP running @env!("out")/bin/xesite, see source code here.