[The nodes in the cluster are named after gods/supercomputers from Xenosaga and
Xenoblade Chronicles. KOS-MOS (a badass robot waifu with a laser sword and also
the reincarnation of a biblical figure, Xenosaga is wild) was one of the
protagonists in Xenosaga and Logos (speech, reason), Ontos (one who is, being)
and Pneuma (breath, spirit) were the three cores of the Trinity Processor in
Xenoblade Chronicles 2. The avatar you see in YouTube videos and VRChat
resembles the in-game model for Pneuma. Alrest is another Xenoblade reference,
but that is an exercise for the reader.](conversation://Mara/hacker)
Building them was fairly straightforward. The process of building a PC has
gotten really streamlined over the years and it really helped that I basically
had 4 carbon copies of the same machine. I hadn't built an Intel tower since
about mid 2015 when I built my old gaming tower while I lived in California.
Something that terrified me back in the day was that tension arm that was used
to lock the processor into the motherboard. I was afraid that I was going to
break it. That tension arm is still present in modern motherboards. It's still
terrifying.
The motherboards I got were kinda cheapo (a natural side effect of sorting by
cost from cheapest to most expensive, I guess), but they did this one
cost-saving measure I didn't even know was possible. Normally motherboards
include a NVMe screw mount so you screw the SSD into the board. This motherboard
came with a plastic NVMe anchor. I popped one end into the board with a spudger
and fastened the drive into the other.
The anchors work fine, but it's still the first time I've ever seen a
motherboard do that.
If you look at the parts list, you'll notice that I didn't get a dedicated CPU
cooler. Those are annoying to install compared to the stock cooler, and I don't
really see myself running into a case where it'd actually be useful. I picked
the one high-end Core i5 model that came with both an integrated GPU and a stock
cooler. One weird thing that Intel did was make the power cable for the stock
cooler wrapped in a chokehold around the CPU cooler itself. I didn't realize
this at first and was confused why my experimental/test machine for the cluster
was throwing "oh god why isn't the CPU fan working" beep codes and refused to
boot past the BIOS. Always make sure the CPU fan power cable isn't strangling
the CPU fan.
After all that comes the NixOS install. I had previously made an [ISO image
that allowed me to automatically install NixOS on virtual
machines](https://github.com/Xe/nixos-configs/tree/master/media/autoinstall).
This fairly dangerous ISO image allows me to provision a new virtual machine
from a blank disk to a fully functional NixOS install in something like 3
minutes.
[In testing, most of the time was taken up by copying the ISO's nix store to the
new virtual machine partition. I don't know if there's a way to make that more
efficient.](conversation://Mara/hacker)
Using KOS-MOS as the experimental machine again, I installed NixOS by hand and
took notes. Here's a scan of the notes I took:
- [Page 1](https://cdn.christine.website/file/christine-static/blog/KOS-MOS+notes+Page+1.jpeg)
- [Page 2](https://cdn.christine.website/file/christine-static/blog/KOS-MOS+notes+Page+2.jpeg)
I set up KOS-MOS to have three partitions: root, swap and the EFI system
partition. I then set up my ZFS datasets with the following pattern:
| Dataset | Description |
| :---------------- | :------------------------------------------------------------------------ |
| `rpool` | The root dataset that everything hands off of, zstd compression |
| `rpool/local` | The parent dataset for data that can be lost without too much issue |
| `rpool/local/nix` | The dataset for the Nix store, this can be regenerated without much issue |
| `rpool/local/vms` | The parent dataset for virtual machines that won't be backed up |
| `rpool/safe` | The parent dataset for data that will be automatically backed up |
| `rpool/safe/home` | `/home`, home directories |
| `rpool/safe/root` | `/`, the root filesystem |
| `rpool/safe/vms` | The parent dataset for virtual machines that will be backed up |
With all of these paths ironed out, I turned those notes into a small install
script. I put that install script
[here](https://github.com/Xe/nixos-configs/blob/0bf2ebdfc6ad9e43f07646d238070074d2890ba0/media/autoinstall-alrest/iso.nix).
I used [nixos-generators](https://github.com/nix-community/nixos-generators) to
make an ISO with this command:
```console
$ nixos-generate -f install-iso -c iso.nix
```
This spat out a 680 megabyte ISO (maybe even small enough it could fit on a CD)
that I wrote to a flashdrive with `dd`:
```console
$ sudo dd if=/path/to/nixos.iso of=/dev/sdc bs=4M
```
Then I stuck the USB drive into KOS-MOS and reinstalled it from that USB. After
a fumble or two with a partitioning command, I had a USB drive that let me
reflash a new base NixOS install with a ZFS root in 3 minutes. If you want to
watch the install, I recorded a video:
I bet that if I used a USB 3.0 drive it could be faster, but 3 minutes is fast
enough. It is a magical experience though. Just plug the USB drive in, boot up
the tower and wait until it powers off. Once I got it working reliably on
KOS-MOS the real test began. I built the next machine (Pneuma) and then
installed NixOS with the magic USB drive. It worked perfectly. I had myself a
cluster.
Once NixOS was installed on the machines, it was running a very basic
configuration. This configuration sets the hostname to `install`, loads my SSH
keys from GitHub and sets the ZFS host ID, but not much else. The next step was
adding KOS-MOS to my Morph setup. I did the initial setup in [this
commit](https://github.com/Xe/nixos-configs/commit/6b08de8e97e1b3b5766806adb08ac3352ef5dd44).
[Wait. You built 4 machines from the same template with (basically) the same
hardware, right? Why would you need to put the host-specific config in the repo
4 times?](conversation://Mara/hmm)
I don't! I created a folder for the Alrest hardware
[here](https://github.com/Xe/nixos-configs/tree/master/common/hardware/alrest).
This contains all of the basic hardware config as well as a few settings that I
want to apply cluster-wide. This allows me to have my Morph manifest look
something like this:
```nix
{
network = { description = "Avalon"; };
# alrest
"kos-mos.alrest" = { config, pkgs, lib, ... }:
let metadata = pkgs.callPackage ../metadata/peers.nix { };
in {
deployment.targetUser = "root";
deployment.targetHost = metadata.raw.kos-mos.ip_addr;
networking.hostName = "kos-mos";
networking.hostId = "472479d4";
imports =
[ ../../common/hardware/alrest ../../hosts/kos-mos/configuration.nix ];
};
"logos.alrest" = { config, pkgs, lib, ... }:
let metadata = pkgs.callPackage ../metadata/peers.nix { };
in {
deployment.targetUser = "root";
deployment.targetHost = metadata.raw.logos.ip_addr;
networking.hostName = "logos";
networking.hostId = "aeace675";
imports =
[ ../../common/hardware/alrest ../../hosts/logos/configuration.nix ];
};
"ontos.alrest" = { config, pkgs, lib, ... }:
let metadata = pkgs.callPackage ../metadata/peers.nix { };
in {
deployment.targetUser = "root";
deployment.targetHost = metadata.raw.ontos.ip_addr;
networking.hostName = "ontos";
networking.hostId = "07602ecc";
imports =
[ ../../common/hardware/alrest ../../hosts/ontos/configuration.nix ];
};
"pneuma.alrest" = { config, pkgs, lib, ... }:
let metadata = pkgs.callPackage ../metadata/peers.nix { };
in {
deployment.targetUser = "root";
deployment.targetHost = metadata.raw.pneuma.ip_addr;
networking.hostName = "pneuma";
networking.hostId = "34fbd94b";
imports =
[ ../../common/hardware/alrest ../../hosts/pneuma/configuration.nix ];
};
}
```
Now I had a bunch of hardware with NixOS installed and the machines were fully
assimilated into my network. I had my base shell config and everything else
fully set up so I could SSH into any of the servers and have everything just
where I wanted it. I had [libvirtd](https://libvirt.org/index.html) installed
with the basic install set, so I wanted to try using [Tailscale Subnet
Routes](https://tailscale.com/kb/1019/subnets/) to expose the virtual machine
subnets to my other machines. As far as I am aware, libvirtd doesn't have a mode
where it can plunk a virtual machine on the network like other hypervisors can.
By default libvirtd sets the default virtual machine network to be on the
`192.168.122.0/24` network. This doesn't conflict with anything on its own,
however when you have many hosts with that same range it can be a bit
problematic. I have a `/16` that I use for my wireguard addressing, so I carved
out a few ranges that I could reserve for each machine:
| Range | Description |
| :--------------- | :-------------------------- |
| `10.77.128.0/24` | KOS-MOS Virtual Machine /24 |
| `10.77.129.0/24` | Logos Virtual Machine /24 |
| `10.77.130.0/24` | Ontos Virtual Machine /24 |
| `10.77.131.0/24` | Pneuma Virtual Machine /24 |
Normally I'd share these subnets over WireGuard. However, Tailscale Subnet
Routes let me do this a bit more directly. I ran this command to enable subnet
routing on each machine:
```bash
function getsubnet () {
case $1 in
kos-mos)
printf "10.77.128.0/24"
;;
logos)
printf "10.77.129.0/24"
;;
ontos)
printf "10.77.130.0/24"
;;
pneuma)
printf "10.77.131.0/24"
;;
esac
}
for host in kos-mos logos ontos pneuma
do
ssh root@$host tailscale up \
--accept-routes \
--advertise-routes="$(getsubnet $host)" \
--advertise-tags=tag:alrest,tag:nixos
done
```
This command is a slightly overengineered version of what I actually did
(something something hindsight something something), but it worked! Then I
configured libvirtd to actually use these subnets by going into `virt-manager`,
connecting to one of the hosts and changed the default network configuration
from something like this:
```xml
defaultef4bc889-e01d-403a-9a92-a0e172b8f42a
```
To something like this:
```xml
default39bf0a49-57ff-4840-8bd6-09c6f3817afe
```
And then I spun up a virtual machine running [Alpine
Linux](https://alpinelinux.org/) and got it on the network. Its IP address was
`10.77.128.90`. Then I tried pinging it from the same machine, another machine
in the same room, another server on the same continent and then finally another
server on the same planet. Here are the results:
Same Machine:
```console
cadey:users@kos-mos ~ ./rw
$ ping 10.77.128.90 -c1
PING 10.77.128.90 (10.77.128.90) 56(84) bytes of data.
64 bytes from 10.77.128.90: icmp_seq=1 ttl=64 time=0.208 ms
--- 10.77.128.90 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.208/0.208/0.208/0.000 ms
```
Same Room:
```console
cadey:users@shachi ~ ./rw
$ ping 10.77.128.90 -c1
PING 10.77.128.90 (10.77.128.90) 56(84) bytes of data.
64 bytes from 10.77.128.90: icmp_seq=1 ttl=63 time=1.11 ms
--- 10.77.128.90 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.105/1.105/1.105/0.000 ms
```
Same continent:
```console
cadey:users@kahless ~ ./rw
$ ping 10.77.128.90 -c1
PING 10.77.128.90 (10.77.128.90) 56(84) bytes of data.
64 bytes from 10.77.128.90: icmp_seq=1 ttl=63 time=5.66 ms
--- 10.77.128.90 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.655/5.655/5.655/0.000 ms
```
And finally a machine on the same planet:
```console
cadey:users@lufta ~ ./rw
$ ping 10.77.128.90 -c1
PING 10.77.128.90 (10.77.128.90) 56(84) bytes of data.
64 bytes from 10.77.128.90: icmp_seq=1 ttl=63 time=107 ms
--- 10.77.128.90 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 106.719/106.719/106.719/0.000 ms
```
This also lets any virtual machine on the cluster reach out to any other virtual
machine, as well as any of the hardware servers. If I install a SerenityOS
virtual machine (a platform that can't run Tailscale as far as I am aware), it
will be able to poke other virtual machines as well as my other servers over
Tailscale like it never happened. It is a magical experience.
I have a lot more compute than I really know what to do with right now. This is
okay though. Lots of slack compute space leaves a lot of room for expansion,
experimentation and other e-words in that category. These CPUs are really dang
fast too, which helps a lot. So far I've used my homelab both while doing [a
short V-tuber-esque stream where I fix a minor annoyance in NixOS and try to
explain what was going on in my head as I did
it](https://www.youtube.com/watch?v=W6h6TuiI-jo) and for writing this article.
Pneuma has sort of become my main SSH box and the other machines run lots of
virtual machines.
In the future I'd like to use this lab for the following things:
- Running some non-critical services out of my basement (Discord bots, etc)
- Implement a VM management substrate called
[waifud](https://github.com/Xe/waifud)
- IPv6 networking for the virtual machines (libvirtd seems to only do IPv4 out
of the gate, configuring IPv6 seems to be a bit unfortunately nontrivial)
- CI for projects on my personal Git server
- Research for Project Elysium/NovOS
---
I hope this was an interesting look into the process and considerations that I
made when assembling my homelab. It's been a fun build and I can't wait to see
what the future will bring us. Either way it should make for some interesting
write-ups on this blog!
Here are some related Twitter threads you may find interesting to look through:
- [Day 1 of the build](https://twitter.com/theprincessxena/status/1400189266450341899)
- [Day 2 of the build](https://twitter.com/theprincessxena/status/1400549314452148227)
- [The aftermath of the realization that I can avoid copying a large part of the
NixOS configuration for each
node](https://twitter.com/theprincessxena/status/1400423623245156356)
- [A thread where I attempt to install Guix on one of the homelab nodes in a
VM](https://twitter.com/theprincessxena/status/1401614346904559617)