</kubernetes>

Signed-off-by: Christine Dodrill <me@christine.website>
2021-01-03 11:42:09 -05:00 · 2021-01-03 11:42:09 -05:00 · 1ae1cc2945
parent 951542ccf2
commit 1ae1cc2945
3 changed files with 238 additions and 2 deletions
--- a/blog/backslash-kubernetes-2021-01-03.markdown
+++ b/blog/backslash-kubernetes-2021-01-03.markdown
@ -0,0 +1,229 @@
+---
+title: "</kubernetes>"
+date: 2021-01-03
+---
+
+# &lt;/kubernetes&gt;
+
+Well, since I posted [that last post](/blog/k8s-pondering-2020-12-31) I have had
+an adventure. A good friend pointed out a server host that I had missed when I
+was looking for other places to use, and now I have migrated my blog to this new
+server. As of yesterday, I now run my website on a dedicated server in the
+Netherlands. Here is the story of my journey to migrate 6 years of cruft and
+technical debt to this new server.
+
+Let's talk about this goliath of a server. This server is an AX41 from Hetzner.
+It has 64 GB of ram, a 512 GB nvme drive, 3 2 TB drives, and a Ryzen 3600. For
+all practical concerns, this beast is beyond overkill and rivals my workstation
+tower in everything but the GPU power. I have named it `lufta`, which is the
+word for feather in [L'ewa](https://lewa.within.website/dictionary.html).
+
+## Assimilation
+
+For my server setup process, the first step it to assimilate it. In this step I
+get a base NixOS install on it somehow. Since I was using Hetzner, I was able to
+boot into a NixOS install image using the process documented
+[here](https://nixos.wiki/wiki/Install_NixOS_on_Hetzner_Online). Then I decided
+that it would also be cool to have this server use
+[zfs](https://en.wikipedia.org/wiki/ZFS) as its filesystem to take advantage of
+its legendary subvolume and snapshotting features.
+
+So I wrote up a bootstrap system definition like the Hetzner tutorial said and
+ended up with `hosts/lufta/bootstrap.nix`:
+
+```nix
+{ pkgs, ... }:
+
+{
+  services.openssh.enable = true;
+  users.users.root.openssh.authorizedKeys.keys = [
+    "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPg9gYKVglnO2HQodSJt4z4mNrUSUiyJQ7b+J798bwD9 cadey@shachi"
+  ];
+
+  networking.usePredictableInterfaceNames = false;
+  systemd.network = {
+    enable = true;
+    networks."eth0".extraConfig = ''
+      [Match]
+      Name = eth0
+      [Network]
+      # Add your own assigned ipv6 subnet here here!
+      Address = 2a01:4f9:3a:1a1c::/64
+      Gateway = fe80::1
+      # optionally you can do the same for ipv4 and disable DHCP (networking.dhcpcd.enable = false;)
+      Address =  135.181.162.99/26
+      Gateway = 135.181.162.65
+    '';
+  };
+
+  boot.supportedFilesystems = [ "zfs" ];
+
+  environment.systemPackages = with pkgs; [ wget vim zfs ];
+}
+```
+
+Then I fired up the kexec tarball and waited for the server to boot into a NixOS
+live environment. A few minutes later I was in. I started formatting the drives
+according to the [NixOS install
+guide](https://nixos.org/manual/nixos/stable/index.html#sec-installation) with
+one major difference: I added a `/boot` ext4 partition on the SSD. This allows
+me to have the system root device on zfs. I added the disks to a `raidz1` pool
+and created a few volumes. I also added the SSD as a log device so I get SSD
+caching.
+
+From there I installed NixOS as normal and rebooted the server. It booted
+normally. I had a shiny new NixOS server in the cloud! I noticed that the server
+had booted into NixOS unstable as opposed to NixOS 20.09 like my other nodes. I
+thought "ah, well, that probably isn't a problem" and continued to the
+configuration step.
+
+[That's ominous...](conversation://Mara/hmm)
+
+## Configuration
+
+Now that the server was assimilated and I could SSH into it, the next step was
+to configure it to run my services. While I was waiting for Hetzner to provision
+my server I ported a bunch of my services over to Nixops services [a-la this
+post](/blog/nixops-services-2020-11-09) in [this
+folder](https://github.com/Xe/nixos-configs/tree/master/common/services) of my
+configs repo. 
+
+Now that I had them, it was time to add this server to my Nixops setup. So I
+opened the [nixops definition
+folder](https://github.com/Xe/nixos-configs/tree/master/nixops/hexagone) and
+added the metadata for `lufta`. Then I added it to my Nixops deployment with
+this command:
+
+```console
+$ nixops modify -d hexagone -n hexagone *.nix
+```
+
+Then I copied over the autogenerated config from `lufta`'s `/etc/nixos/` folder
+into
+[`hosts/lufta`](https://github.com/Xe/nixos-configs/tree/master/hosts/lufta) and
+ran a `nixops deploy` to add some other base configuration.
+
+## Migration
+
+Once that was done, I started enabling my services and pushing configs to test
+them. After I got to a point where I thought things would work I opened up the
+Kubernetes console and started deleting deployments on my kubernetes cluster as
+I felt "safe" to migrate them over. Then I saw the deployments come back. I
+deleted them again and they came back again.
+
+Oh, right. I enabled that one Kubernetes service that made it intentionally hard
+to delete deployments. One clever set of scale-downs and kills later and I was
+able to kill things with wild abandon.
+
+I copied over the gitea data with `rsync` running in the kubernetes deployment.
+Then I killed the gitea deployment, updated DNS and reran a whole bunch of gitea
+jobs to resanify the environment. I did a test clone on a few of my repos and
+then I deleted the gitea volume from DigitalOcean.
+
+Moving over the other deployments from Kubernetes into NixOS services was
+somewhat easy, however I did need to repackage a bunch of my programs and static
+sites for NixOS. I made the
+[`pkgs`](https://github.com/Xe/nixos-configs/tree/master/pkgs) tree a bit more
+fleshed out to compensate.
+
+[Okay, packaging static sites in NixOS is beyond overkill, however a lot of them
+need some annoyingly complicated build steps and throwing it all into Nix means
+that we can make them reproducible and use one build system to rule them
+all. Not to mention that when I need to upgrade the system, everything will
+rebuild with new system libraries to avoid the <a
+href="https://blog.tidelift.com/bit-rot-the-silent-killer">Docker bitrot
+problem</a>.](conversation://Mara/hacker)
+
+## Reboot Test
+
+After a significant portion of the services were moved over, I decided it was
+time to do the reboot test. I ran the `reboot` command and then...nothing.
+My continuous ping test was timing out. My phone was blowing up with downtime
+messages from NodePing. Yep, I messed something up.
+
+I was able to boot the server back into a NixOS recovery environment using the
+kexec trick, and from there I was able to prove the following:
+
+- The zfs setup is healthy
+- I can read some of the data I migrated over
+- I can unmount and remount the ZFS volumes repeatedly
+
+I was confused. This shouldn't be happening. After half an hour of
+troubleshooting, I gave in and ordered an IPKVM to be installed in my server.
+
+Once that was set up (and I managed to trick MacOS into letting me boot a .jnlp
+web start file), I rebooted the server so I could see what error I was getting
+on boot. I missed it the first time around, but on the second time I was able to
+capture this screenshot:
+
+![The error I was looking
+for](https://cdn.christine.website/file/christine-static/blog/Screen+Shot+2021-01-03+at+1.13.05+AM.png)
+
+Then it hit me. I did the install on NixOS unstable. My other servers use NixOS
+20.09. I had downgraded zfs and the older version of zfs couldn't mount the
+volume created by the newer version of zfs in read/write mode. One more trip to
+the recovery environment later to install NixOS unstable in a new generation.
+
+Then I switched my tower's default NixOS channel to the unstable channel and ran
+`nixops deploy` to reactivate my services. After the NodePing uptime
+notifications came in, I ran the reboot test again while looking at the console
+output to be sure.
+
+It booted. It worked. I had a stable setup. Then I reconnected to IRC and passed
+out.
+
+## Services Migrated
+
+Here is a list of all of the services I have migrated over from my old dedicated
+server, my kubernetes cluster and my dokku server:
+
+- aerial -> discord chatbot
+- goproxy -> go modules proxy
+- lewa -> https://lewa.within.website
+- hlang -> https://h.christine.website
+- mi -> https://mi.within.website
+- printerfacts -> https://printerfacts.cetacean.club
+- xesite -> https://christine.website
+- graphviz -> https://graphviz.christine.website
+- idp -> https://idp.christine.website
+- oragono -> ircs://irc.within.website:6697/
+- tron -> discord bot
+- withinbot -> discord bot
+- withinwebsite -> https://within.website
+- gitea -> https://tulpa.dev
+- other static sites
+
+Doing this migration is a bit of an archaeology project as well. I was
+continuously discovering services that I had littered over my machines with very
+poorly documented requirements and configuration. I hope that this move will let
+the next time I do this kind of migration be a lot easier by comparison.
+
+I still have a few other services to move over, however the ones that are left
+are much more annoying to set up properly. I'm going to get to deprovision 5
+servers in this migration and as a result get this stupidly powerful goliath of
+a server to do whatever I want with and I also get to cut my monthly server
+costs by over half.
+
+I am very close to being able to turn off the Kubernetes cluster and use NixOS
+for everything. A few services that are still on the Kubernetes cluster are
+resistant to being nixified, so I may have to use the Docker containers for
+that. I was hoping to be able to cut out Docker entirely, however we don't seem
+to be that lucky yet.
+
+Sure, there is some added latency with the server being in Europe instead of
+Montreal, however if this ever becomes a practical issue I can always launch a
+cheap DigitalOcean VPS in Toronto to act as a DNS server for my WireGuard setup.
+
+Either way, I am now off Kubernetes for my highest traffic services. If services
+of mine need to use the disk, they can now just use the disk. If I really care
+about the data, I can add the service folders to the list of paths to back up to
+`rsync.net` (I have a post about how this backup process works in the drafting
+stage) via [borgbackup](https://www.borgbackup.org/).
+
+Let's hope it stays online!
+
+---
+
+Many thanks to [Graham Christensen](https://twitter.com/grhmc), [Dave
+Anderson](https://twitter.com/dave_universetf) and everyone else who has been
+helping me along this journey. I would be lost without them.
--- a/src/build.rs
+++ b/src/build.rs
@ -9,6 +9,13 @@ fn main() -> Result<()> {
        .output()
        .unwrap();
    let git_hash = String::from_utf8(output.stdout).unwrap();
-    println!("cargo:rustc-env=GITHUB_SHA={}", git_hash);
+    println!(
+        "cargo:rustc-env=GITHUB_SHA={}",
+        if git_hash.as_str() == "" {
+            env!("out").into()
+        } else {
+            git_hash
+        }
+    );
    Ok(())
 }
--- a/templates/footer.rs.html
+++ b/templates/footer.rs.html
@ -7,7 +7,7 @@
            <blockquote>Copyright 2020 Christine Dodrill. Any and all opinions listed here are my own and not representative of my employers; future, past and present.</blockquote>
            <!--<p>Like what you see? Donate on <a href="https://www.patreon.com/cadey">Patreon</a> like <a href="/patrons">these awesome people</a>!</p>-->
            <p>Looking for someone for your team? Take a look <a href="/signalboost">here</a>.</p>
-            <p>Served by @APP running commit <a href="https://github.com/Xe/site/commit/@env!("GITHUB_SHA")">@env!("GITHUB_SHA")</a>, see <a href="https://github.com/Xe/site">source code here</a>.</p>
+            <p>Served by @APP running @env!("out")/bin/xesite</a>, see <a href="https://github.com/Xe/site">source code here</a>.</p>
        </footer>

        </div>