From 210863197ee5a6e8b3883fdb9c223537f790b175 Mon Sep 17 00:00:00 2001 From: Christine Dodrill Date: Tue, 8 Jun 2021 12:29:34 +0000 Subject: [PATCH] homelab article Signed-off-by: Christine Dodrill --- blog/my-homelab-2021-06-08.markdown | 498 ++++++++++++++++++++++++++++ 1 file changed, 498 insertions(+) create mode 100644 blog/my-homelab-2021-06-08.markdown diff --git a/blog/my-homelab-2021-06-08.markdown b/blog/my-homelab-2021-06-08.markdown new file mode 100644 index 0000000..da67192 --- /dev/null +++ b/blog/my-homelab-2021-06-08.markdown @@ -0,0 +1,498 @@ +--- +title: My Homelab Build +date: 2021-06-08 +tags: + - homelab + - no-kill-like-overkill +--- + +# My Homelab Build + +There are many things you can be cursed into enjoying. One of my curses is +enjoying philosophy/linguistics. This leads you into many fun conversations +about how horrible English is that can get boring after a while. One of my other, +bigger curses is that I'm a computer person. Specifically a computer person that +enjoys playing with distributed systems. This is an expensive hobby, especially +when all you really have is The Cloud™. + +One thing that I do a lot is run virtual machines. Some of these stick around, a +lot of them are very ephemeral. I also like being able to get into these VMs +quickly if I want to mess around with a given distribution or OS. Normally I'd +run these on [my gaming +tower](https://christine.website/blog/nixos-desktop-flow-2020-04-25), however +this makes my tower very load-bearing. I also want to play games sometimes on my +tower, and even though there have been many strides in getting games to run well +on Linux it's still not as good as I'd like it to be. + +[In fact, it's actually kinda convenient that it's _hard_ for me to play games +on Linux so that it's harder for me to have entire days eaten by doing +it. Factorio and other games like it are _really_ dangerous for +me.](conversation://Cadey/coffee) + +For many years my home server has been a 2013 Mac Pro, the trash can one. It's a +very capable machine. It's a beautiful looking computer, however in terms of +performance it's really not up to snuff anymore. It works, it's still my +prometheus server, but overall it's quite slow in comparison to what I've ended +up needing. + +It probably also doesn't help that my coworkers have given me a serious case of +homelab envy. A few of my coworkers have full rackmount setups. This is also +dangerous for my wallet. + +My initial plan was to get 3 rackmount servers in a soundproof rack box. I +wanted to get octo-core Xeons in them (preferably 2 of them) and something on +the order of 64 GB of ram in each node. For my needs, this is absurdly beyond +overkill. Storage would be on NVMe and rotational drives with +[ZFS](https://openzfs.org/wiki/Main_Page) as the filesystem. + +[I thought overkill was the motto of this blog.](conversation://Mara/happy) + +[Nope. It's "there's no kill like overkill". Subtle difference, but it's a +significant one in this case.](conversation://Cadey/enby) + +Among other things, running a datacenter in your basement really requires you to +have a basement. This place that my fiancé and I moved to doesn't really have a +proper basement. One of the advantages of having a proper basement is that you +can put servers in it without really bothering anyone. Server fan noise tends to +range from "dull roar" to "jet engine takeoff". This can cause problems if you +are and/or live with someone who is noise sensitive. Soundproof racks exist, +however I wasn't sure if the noise reduction would really be enough to make up +for the cost. + +Then there's the power cost. Electricity in Ontario is expensive. Our home +office also only has a 15 watt breaker. With our work laptops and gaming towers +set up, the home printer was enough to push us over the line and flip the +breaker. A full rackmount server setup would never have worked. Electricity is +covered by our rent payments, however I don't really want to use more power than +I really have to. + +After more research and bisecting a bnuch of options through +[PCPartpicker](https://ca.pcpartpicker.com), I ended up with a set of hardware +that I am calling the Alrest. [Here](https://ca.pcpartpicker.com/list/8jC7bh) +are its specs on PCPartpicker. It is designed to balance these factors as much +as possible: + +- Cost - I had a budget of about 4,000 CAD that I was willing to spend on the + whole project +- Parts availability in Canada - Parts are annoying to get in Canada in normal + cases, COVID has made it worse +- Performance - My existing balance of cloud servers and old laptops has gotten + me fairly far, but it is starting to show its limits +- Cores - More cores = more faster + +The Alrest is a micro-ATX tower with the following major specifications: + +- An [Intel Core i5 + 10600](https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i5-10600+%40+3.30GHz&id=3750) +- [32 GB DDR4 + ram](https://ca.pcpartpicker.com/product/LBJmP6/oloy-warhawk-rgb-32-gb-2-x-16-gb-ddr4-3200-cl16-memory-nd4u1632161dcwdx) + (I have no idea how this happened, but the cheapest way for me to get the ram + I wanted was to get RGB ram again) +- 1 TB NVMe drive (I had to get them from multiple vendors because of Chia + miners causing companies to limit drives to 1/2 per person) + +[Why do you have a i5 10600? You could get a beefier +processor.](conversation://Mara/hmm) + +[All the beefier CPUs don't ship with an integrated GPU, so I'd have to get a +hardware GPU (which is near impossible due to memecoin farmers and the car +industry hoovering up all the semiconductor supply) that would waste power +showing a login screen for all eternity. Not to mention those beefier CPUs also +don't ship with a CPU fan so I'd need to get a heatsink. I wish Intel made +better processors with both an iGPU and a heatsink. I'm probably a huge +exception to the normal case of system buyers +though.](conversation://Cadey/coffee) + +Thanks to the meddling of a server sommelier that I banter with, I got 4 nodes. + +
+ +[The nodes in the cluster are named after gods/supercomputers from Xenosaga and +Xenoblade Chronicles. KOS-MOS was one of the protagonists in Xenosaga and Logos +(speech, reason), Ontos (one who is, being) and Pneuma (breath, spirit) were the +three cores of the Trinity Processor in Xenoblade Chronicles 2. The avatar you +see in YouTube videos and VRChat resembles the in-game model for +Pneuma. Alrest is another Xenoblade reference, but that is an excersize for the +reader.](conversation://Mara/hacker) + +Building them was fairly straightforward. The process of building a PC has +gotten really streamlined over the years and it really helped that I basically +had 4 carbon copies of the same machine. I hadn't built an Intel tower since +about mid 2015 when I built my old gaming tower while I lived in California. +Something that terrified me back in the day was that tension arm that was used +to lock the processor into the motherboard. I was afraid that I was going to +break it. That tension arm is still present in modern motherboards. It's still +terrifying. + +The motherboards I got were kinda cheapo (a natural side effect of sorting by +cost from cheapest to most expensive, I guess), but they did this one +cost-saving measure I didn't even know was possible. Normally motherboards +include a NVMe screw mount so you screw the SSD into the board. This motherboard +came with a plastic NVMe anchor. I popped one end into the board with a spudger +and fastened the drive into the other. + +
+ +The anchors work fine, but it's still the first time I've ever seen a +motherboard do that. + +If you look at the parts list, you'll notice that I didn't get a dedicated CPU +cooler. Those are annoying to install compared to the stock cooler, and I don't +really see myself running into a case where it'd actually be useful. I picked +the one high-end Core i5 model that came with both an integrated GPU and a stock +cooler. One weird thing that Intel did was make the power cable for the stock +cooler wrapped in a chokehold around the CPU cooler itself. I didn't realize +this at first and was confused why my experimental/test machine for the cluster +was throwing "oh god why isn't the CPU fan working" beep codes and refused to +boot past the BIOS. Always make sure the CPU fan power cable isn't strangling +the CPU fan. + +After all that comes the NixOS install. I had previously made an [ISO image +that allowed me to automatically install NixOS on virtual +machines](https://github.com/Xe/nixos-configs/tree/master/media/autoinstall). +This fairly dangerous ISO image allows me to provision a new virtual machine +from a blank disk to a fully functional NixOS install in something like 3 +minutes. + +[In testing, most of the time was taken up by copying the ISO's nix store to the +new virtual machine partition. I don't know if there's a way to make that more +efficient.](conversation://Mara/hacker) + +Using KOS-MOS as the experimental machine again, I installed NixOS by hand and +took notes. Here's a scan of the notes I took: + +- [Page 1](https://cdn.christine.website/file/christine-static/blog/KOS-MOS+notes+Page+1.jpeg) +- [Page 2](https://cdn.christine.website/file/christine-static/blog/KOS-MOS+notes+Page+2.jpeg) + +I set up KOS-MOS to have three partitions: root, swap and the EFI system +partition. I then set up my ZFS datasets with the following pattern: + +| Dataset | Description | +| :---------------- | :------------------------------------------------------------------------ | +| `rpool` | The root dataset that everything hands off of, zstd compression | +| `rpool/local` | The parent dataset for data that can be lost without too much issue | +| `rpool/local/nix` | The dataset for the Nix store, this can be regenerated without much issue | +| `rpool/local/vms` | The parent dataset for virtual machines that won't be backed up | +| `rpool/safe` | The parent dataset for data that will be automatically backed up | +| `rpool/safe/home` | `/home`, home directories | +| `rpool/safe/root` | `/`, the root filesystem | +| `rpool/safe/vms` | The parent dataset for virtual machines that will be backed up | + +With all of these paths ironed out, I turned those notes into a small install +script. I put that install script +[here](https://github.com/Xe/nixos-configs/blob/0bf2ebdfc6ad9e43f07646d238070074d2890ba0/media/autoinstall-alrest/iso.nix). +I used [nixos-generators](https://github.com/nix-community/nixos-generators) to +make an ISO with this command: + +```console +$ nixos-generate -f install-iso -c iso.nix +``` + +This spat out a 680 megabyte ISO (maybe even small enough it could fit on a CD) +that I wrote to a flashdrive with `dd`: + +```console +$ sudo dd if=/path/to/nixos.iso of=/dev/sdc bs=4M +``` + +Then I stuck the USB drive into KOS-MOS and reinstalled it from that USB. After +a fumble or two with a partitioning command, I had a USB drive that let me +reflash a new base NixOS install with a ZFS root in 3 minutes. If you want to +watch the install, I recorded a video: + +
+ +I bet that if I used a USB 3.0 drive it could be faster, but 3 minutes is fast +enough. It is a magical experience though. Just plug the USB drive in, boot up +the tower and wait until it powers off. Once I got it working reliably on +KOS-MOS the real test began. I built the next machine (Pneuma) and then +installed NixOS with the magic USB drive. It worked perfectly. I had myself a +cluster. + +Once NixOS was installed on the machines, it was running a very basic +configuration. This configuration sets the hostname to `install`, loads my SSH +keys from GitHub and sets the ZFS host ID, but not much else. The next step was +adding KOS-MOS to my Morph setup. I did the initial setup in [this +commit](https://github.com/Xe/nixos-configs/commit/6b08de8e97e1b3b5766806adb08ac3352ef5dd44). + +[Wait. You built 4 machines from the same template with (basically) the same +hardware, right? Why would you need to put the host-specific config in the repo +4 times?](conversation://Mara/hmm) + +I don't! I created a folder for the Alrest hardware +[here](https://github.com/Xe/nixos-configs/tree/master/common/hardware/alrest). +This contains all of the basic hardware config as well as a few settings that I +want to apply cluster-wide. This allows me to have my Morph manifest look +something like this: + +```nix +{ + network = { description = "Avalon"; }; + + # alrest + "kos-mos.alrest" = { config, pkgs, lib, ... }: + let metadata = pkgs.callPackage ../metadata/peers.nix { }; + in { + deployment.targetUser = "root"; + deployment.targetHost = metadata.raw.kos-mos.ip_addr; + networking.hostName = "kos-mos"; + networking.hostId = "472479d4"; + + imports = + [ ../../common/hardware/alrest ../../hosts/kos-mos/configuration.nix ]; + }; + + "logos.alrest" = { config, pkgs, lib, ... }: + let metadata = pkgs.callPackage ../metadata/peers.nix { }; + in { + deployment.targetUser = "root"; + deployment.targetHost = metadata.raw.logos.ip_addr; + networking.hostName = "logos"; + networking.hostId = "aeace675"; + + imports = + [ ../../common/hardware/alrest ../../hosts/logos/configuration.nix ]; + }; + + "ontos.alrest" = { config, pkgs, lib, ... }: + let metadata = pkgs.callPackage ../metadata/peers.nix { }; + in { + deployment.targetUser = "root"; + deployment.targetHost = metadata.raw.ontos.ip_addr; + networking.hostName = "ontos"; + networking.hostId = "07602ecc"; + + imports = + [ ../../common/hardware/alrest ../../hosts/ontos/configuration.nix ]; + }; + + "pneuma.alrest" = { config, pkgs, lib, ... }: + let metadata = pkgs.callPackage ../metadata/peers.nix { }; + in { + deployment.targetUser = "root"; + deployment.targetHost = metadata.raw.pneuma.ip_addr; + networking.hostName = "pneuma"; + networking.hostId = "34fbd94b"; + + imports = + [ ../../common/hardware/alrest ../../hosts/pneuma/configuration.nix ]; + }; +} +``` + +Now I had a bunch of hardware with NixOS installed and the machines were fully +assimilated into my network. I had my base shell config and everything else +fully set up so I could SSH into any of the servers and have everything just +where I wanted it. I had [libvirtd](https://libvirt.org/index.html) installed +with the basic install set, so I wanted to try using [Tailscale Subnet +Routes](https://tailscale.com/kb/1019/subnets/) to expose the virtual machine +subnets to my other machines. As far as I am aware, libvirtd doesn't have a mode +where it can plunk a virtual machine on the network like other hypervisors can. + +By default libvirtd sets the default virtual machine network to be on the +`192.168.122.0/24` network. This doesn't conflict with anything on its own, +however when you have many hosts with that same range it can be a bit +problematic. I have a `/16` that I use for my wireguard addressing, so I carved +out a few ranges that I could reserve for each machine: + +| Range | Description | +| :--------------- | :-------------------------- | +| `10.77.128.0/24` | KOS-MOS Virtual Machine /24 | +| `10.77.129.0/24` | Logos Virtual Machine /24 | +| `10.77.130.0/24` | Ontos Virtual Machine /24 | +| `10.77.131.0/24` | Pneuma Virtual Machine /24 | + +Normally I'd share these subnets over WireGuard. However, Tailscale Subnet +Routes let me do this a bit more directly. I ran this command to enable subnet +routing on each machine: + +```bash +function getsubnet () { + case $1 in + kos-mos) + printf "10.77.128.0/24" + ;; + logos) + printf "10.77.129.0/24" + ;; + ontos) + printf "10.77.130.0/24" + ;; + pneuma) + printf "10.77.131.0/24" + ;; + esac +} + +for host in kos-mos logos ontos pneuma +do + ssh root@$host tailscale up \ + --accept-routes \ + --advertise-routes="$(getsubnet $host)" \ + --advertise-tags=tag:alrest,tag:nixos +done +``` + +This command is a slightly overengineered version of what I actually did +(something something hindsight something something), but it worked! Then I +configured libvirtd to actually use these subnets by going into `virt-manager`, +connecting to one of the hosts and changed the default network configuration +from something like this: + +```xml + + default + ef4bc889-e01d-403a-9a92-a0e172b8f42a + + + + + + + + + + + + + +``` + +To something like this: + +```xml + + default + 39bf0a49-57ff-4840-8bd6-09c6f3817afe + + + + + + + + + + + + + + +``` + +And then I spun up a virtual machine running [Alpine +Linux](https://alpinelinux.org/) and got it on the network. Its IP address was +`10.77.128.90`. Then I tried pinging it from the same machine, another machine +in the same room, another server on the same continent and then finally another +server on the same planet. Here are the results: + +Same Machine: + +```console +cadey:users@kos-mos ~ ./rw +$ ping 10.77.128.90 -c1 +PING 10.77.128.90 (10.77.128.90) 56(84) bytes of data. +64 bytes from 10.77.128.90: icmp_seq=1 ttl=64 time=0.208 ms + +--- 10.77.128.90 ping statistics --- +1 packets transmitted, 1 received, 0% packet loss, time 0ms +rtt min/avg/max/mdev = 0.208/0.208/0.208/0.000 ms +``` + +Same Room: + +```console +cadey:users@shachi ~ ./rw +$ ping 10.77.128.90 -c1 +PING 10.77.128.90 (10.77.128.90) 56(84) bytes of data. +64 bytes from 10.77.128.90: icmp_seq=1 ttl=63 time=1.11 ms + +--- 10.77.128.90 ping statistics --- +1 packets transmitted, 1 received, 0% packet loss, time 0ms +rtt min/avg/max/mdev = 1.105/1.105/1.105/0.000 ms +``` + +Same continent: + +```console +cadey:users@kahless ~ ./rw +$ ping 10.77.128.90 -c1 +PING 10.77.128.90 (10.77.128.90) 56(84) bytes of data. +64 bytes from 10.77.128.90: icmp_seq=1 ttl=63 time=5.66 ms + +--- 10.77.128.90 ping statistics --- +1 packets transmitted, 1 received, 0% packet loss, time 0ms +rtt min/avg/max/mdev = 5.655/5.655/5.655/0.000 ms +``` + +And finally a machine on the same planet: + +```console +cadey:users@lufta ~ ./rw +$ ping 10.77.128.90 -c1 +PING 10.77.128.90 (10.77.128.90) 56(84) bytes of data. +64 bytes from 10.77.128.90: icmp_seq=1 ttl=63 time=107 ms + +--- 10.77.128.90 ping statistics --- +1 packets transmitted, 1 received, 0% packet loss, time 0ms +rtt min/avg/max/mdev = 106.719/106.719/106.719/0.000 ms +``` + +This also lets any virtual machine on the cluster reach out to any other virtual +machine, as well as any of the hardware servers. If I install a SerenityOS +virtual machine (a platform that can't run Tailscale as far as I am aware), it +will be able to poke other virtual machines as well as my other servers over +Tailscale like it never happened. It is a magical experience. + +I have a lot more compute than I really know what to do with right now. This is +okay though. Lots of slack compute space leaves a lot of room for expansion, +experimentation and other e-words in that category. These CPUs are really dang +fast too, which helps a lot. So far I've used my homelab both while doing [a +short V-tuber-esque stream where I fix a minor annoyance in NixOS and try to +explain what was going on in my head as I did +it](https://www.youtube.com/watch?v=W6h6TuiI-jo) and for writing this article. +Pneuma has sort of become my main SSH box and the other machines run lots of +virtual machines. + +In the future I'd like to use this lab for the following things: + +- Running some non-critical services out of my basement (Discord bots, etc) +- Implement a VM management substrate called + [waifud](https://github.com/Xe/waifud) +- IPv6 networking for the virtual machines (libvirtd seems to only do IPv4 out + of the gate, configuring IPv6 seems to be a bit unfortunately nontrivial) +- CI for projects on my personal Git server +- Research for Project Elysium/NovOS + +--- + +I hope this was an interesting look into the process and considerations that I +made when assembling my homelab. It's been a fun build and I can't wait to see +what the future will bring us. Either way it should make for some interesting +write-ups on this blog! + +Here are some related Twitter threads you may find interesting to look through: + +- [Day 1 of the build](https://twitter.com/theprincessxena/status/1400189266450341899) +- [Day 2 of the build](https://twitter.com/theprincessxena/status/1400549314452148227) +- [The aftermath of the realization that I can avoid copying a large part of the + NixOS configuration for each + node](https://twitter.com/theprincessxena/status/1400423623245156356) +- [A thread where I attempt to install Guix on one of the homelab nodes in a + VM](https://twitter.com/theprincessxena/status/1401614346904559617) +