xesite/blog/cloud-init-2021-06-04.markdown

481 lines
18 KiB
Markdown
Raw Permalink Normal View History

---
title: My Magical Adventure With cloud-init
date: 2021-06-04
---
# My Magical Adventure With cloud-init
> "If I had a world of my own, everything would be nonsense. Nothing would be
> what it is, because everything would be what it isn't. And contrary wise, what
> is, it wouldn't be. And what it wouldn't be, it would. You see?"
- The Mad Hatter, Alice's Adventures in Wonderland
The modern cloud is a magical experience. You take a template, give it some SSH
keys and maybe some user-data and then you have a server running somewhere. This
is all powered by a tool called [cloud-init](https://cloud-init.io/). cloud-init
is the most useful in actual datacenters with proper metadata services, but what
if you aren't in a datacenter with a metadata service?
Recently I wanted to test a
[script](https://github.com/tailscale/tailscale/blob/main/scripts/installer.sh)
a coworker wrote that allows users to automatically install Tailscale on every
distro and version Tailscale supports. I wanted to try and avoid having to
install each version of every distribution manually, so I started looking for
options.
[This may seem like overkill (and at some level it probably is), however as a
side effect of going through this song and dance you can spin up a bunch of VMs
pretty easily. <br /> <center> <blockquote class="twitter-tweet"><p lang="und"
dir="ltr"><a
href="https://t.co/yays27Wmes">pic.twitter.com/yays27Wmes</a></p>&mdash; Xe from
Within (@theprincessxena) <a
href="https://twitter.com/theprincessxena/status/1394265890494062593?ref_src=twsrc%5Etfw">May
17, 2021</a></blockquote> <script async
src="https://platform.twitter.com/widgets.js"
charset="utf-8"></script> <center>](conversation://Mara/hacker)
cloud-init has a feature called the
[NoCloud](https://cloudinit.readthedocs.io/en/latest/topics/datasources/nocloud.html)
data source. To use it, you need to write two yaml files, put them into a
specially named ISO file and then mount it to the virtual machine. cloud-init
will then pick up your configuration data and apply it.
[Wait...really? What.](conversation://Mara/hmm)
[Yes, really.](conversation://Cadey/coffee)
Let's make an [Amazon Linux
2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/amazon-linux-2-virtual-machine.html)
virtual machine as an example. Amazon offers their Linux distribution for
download so you can run it on-premises (I don't really know why you'd want to do
this outside of testing stuff on Amazon Linux). In this blog we use KVM, so keep
that in mind when you set things up yourself.
First you need to make a `meta-data` file, this will contain the VM's hostname
and the "instance ID" (this makes sense in cloud contexts however you can use
whatever you want):
```yaml
local-hostname: mayhem
instance-id: 31337
```
[You can configure networking settings here, but our VM is going to get an
address over DHCP so you don't really need to care about that in this case](conversation://Mara/hacker)
Next you need to make a `user-data` file, this will actually configure your VM:
```yaml
#cloud-config
#vim:syntax=yaml
cloud_config_modules:
- runcmd
cloud_final_modules:
- [users-groups, always]
- [scripts-user, once-per-instance]
users:
- name: xe
groups: [ wheel ]
sudo: [ "ALL=(ALL) NOPASSWD:ALL" ]
shell: /bin/bash
ssh-authorized-keys:
- ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPYr9hiLtDHgd6lZDgQMkJzvYeAXmePOrgFaWHAjJvNU cadey@ontos
write_files:
- path: /etc/cloud/cloud.cfg.d/80_disable_network_after_firstboot.cfg
content: |
# Disable network configuration after first boot
network:
config: disabled
```
Please make sure to change the username and swap out the SSH key as needed,
unless you want to get locked out of your VM. For more information about what
you can do from cloud-init, see the list of modules
[here](http://cloudinit.readthedocs.io/en/latest/topics/modules.html).
Now that you have the two yaml files you can make the seed image with this
command (Linux):
```console
$ genisoimage -output seed.iso \
-volid cidata \
-joliet \
-rock \
user-data meta-data
```
[In NixOS you may need to run it inside nix-shell: `nix-shell -p
cdrkit`.](conversation://Mara/hacker)
Or this command (macOS):
```console
$ hdiutil makehybrid \
-o seed.iso \
-hfs \
-joliet \
-iso \
-default-volume-name cidata \
user-data meta-data
```
Now you can download the KVM image from that [Amazon Linux User Guide page from
earlier](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/amazon-linux-2-virtual-machine.html)
and then put it somewhere safe. This image will be written into a [ZFS
zvol](https://pthree.org/2012/12/21/zfs-administration-part-xiv-zvols/). To find
out how big the zvol needs to be, you can use `qemu-img info`:
```console
$ qemu-img info amzn2-kvm-2.0.20210427.0-x86_64.xfs.gpt.qcow2
image: amzn2-kvm-2.0.20210427.0-x86_64.xfs.gpt.qcow2
file format: qcow2
virtual size: 25 GiB (26843545600 bytes)
disk size: 410 MiB
cluster_size: 65536
Format specific information:
compat: 1.1
compression type: zlib
lazy refcounts: false
refcount bits: 16
corrupt: false
extended l2: false
```
The virtual disk image is 25 gigabytes, so you can create it with a command like
this:
```console
$ sudo zfs create -V 25G rpool/safe/vms/mayhem
```
Then you use `qemu-img convert` to copy the image into the zvol:
```console
$ sudo qemu-img convert \
-O raw \
amzn2-kvm-2.0.20210427.0-x86_64.xfs.gpt.qcow2 \
/dev/zvol/rpool/safe/vms/mayhem
```
If you don't use ZFS you can make a layered disk using `qemu-img create`:
```console
$ qemu-img create \
-f qcow2 \
-o backing_file=amzn2-kvm-2.0.20210427.0-x86_64.xfs.gpt.qcow2 \
mayhem.qcow2
```
Open up virt-manager and then create a new virtual machine. Make sure you select
"Manual install".
<center>
![The first step of the "create a new virtual machine" wizard in virt-manager
with "manual install"
selected](https://cdn.christine.website/file/christine-static/blog/20210604_06h43m27s_grim.png)
</center>
virt-manager will then ask you what OS the virtual machine is running so it can
load some known working defaults. It doesn't have an option for Amazon Linux,
but it's kinda sorta like CentOS 7, so enter CentOS 7 here.
<center>
![The second step of the "create a new virtual machine" wizard in virt-manager
with "CentOS 7" selected as the OS the virtual machine will be
running](https://cdn.christine.website/file/christine-static/blog/20210604_06h45m35s_grim.png)
</center>
The default amount of ram and CPU are fine, but you can choose other options if
you have more restrictive hardware requirements.
<center>
![The third step of the "create a new virtual machine" wizard in virt-manager
with 1024 MB of ram and 2 virtual CPU cores
selected](https://cdn.christine.website/file/christine-static/blog/20210604_06h50m09s_grim.png)
</center>
Now you need to select the storage path for the VM. virt-manager will helpfully
offer to create a new virtual disk for you. You already made the disk with the
above steps, so enter in `/dev/zvol/rpool/safe/vms/mayhem` (or the path to your
custom layered qcow2 from the above `qemu-img create` command) as the disk
location.
<center>
![The fourth step of the "create a new virtual machine" wizard in virt-manager
with `/dev/zvol/rpool/safe/vms/mayhem` selected as the path to the
disk](https://cdn.christine.website/file/christine-static/blog/20210604_06h53m58s_grim.png)
</center>
Finally, name the VM and then choose "Customize configuration before install" so
you can mount the seed data.
<center>
![The last step of the "create a new virtual machine" wizard in virt-manager,
setting the virtual machine name to "mayhem" and indicating that you want to
customize configuration before
installation](https://cdn.christine.website/file/christine-static/blog/20210604_06h56m54s_grim.png)
</center>
Click on the "Add Hardware" button in the lower left corner of the configuration
window.
<center>
![](https://cdn.christine.website/file/christine-static/blog/20210604_06h58m53s_grim.png)
</center>
Make a new CDROM storage device that points to your seed image:
<center>
![](https://cdn.christine.website/file/christine-static/blog/20210604_07h01m24s_grim.png)
</center>
And then click "Begin Installation". The virtual machine will be created and its
graphical console will open. Click on the info tab and then the NIC device. The
VM's IP address will be listed:
<center>
![](https://cdn.christine.website/file/christine-static/blog/20210604_07h05m28s_grim.png)
</center>
Now SSH into the VM:
```console
$ ssh xe@192.168.122.122
The authenticity of host '192.168.122.122 (192.168.122.122)' can't be established.
ED25519 key fingerprint is SHA256:TP7dWLkHOixx5tr78qn0yvDQKttH0yWz6IBvbadEqcs.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.122.122' (ED25519) to the list of known hosts.
__| __|_ )
_| ( / Amazon Linux 2 AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-2/
8 package(s) needed for security, out of 17 available
Run "sudo yum update" to apply all updates.
[xe@mayhem ~]$
```
And voila! A new virtual machine that you can do whatever you want with, just
like you would any other server.
[Do you really need to make an ISO file for this? Can't I just use HTTP like <a
href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html">the
AWS metadata service</a>?](conversation://Mara/hmm)
Yes and no. You can have the configuration loaded over HTTP/S, but without
special network configuration you won't be able to have `http://169.254.169.254`
work like the AWS metadata service without a fair bit of effort. Either way, you
are going to have to edit the virtual machine's XML though.
[XML? Why is XML involved?](conversation://Mara/wat)
virt-manager is a frontend to [libvirt](https://libvirt.org/index.html). libvirt
uses XML to describe virtual machines.
[Here](https://gist.github.com/Xe/f870ebb2d9dce0929a35a4ba347cbda3) is the XML
used to describe the VM you made earlier. This looks like a lot (because frankly
it is a lot, computers are complicated), however this is a lot more manageable
than the equivalent qemu flags.
[What do the qemu flags look like?](conversation://Mara/hmm)
[Like
this](https://gist.githubusercontent.com/Xe/2eba35ec6cbd54becf9fca02f6d69f0b/raw/89d68424c0ae26333d798bd9bd6a224dfec844d7/qemu%2520flags.txt).
It is kind of a mess that I would rather have something made by people smarter
than me take care of.
To enable cloud-init to load over HTTP, you are going to have to add the qemu XML
namespace to mayhem's configuration. At the top you should see a line that looks
like this:
```xml
<domain type="kvm">
```
Replace it with one that looks like this:
```xml
<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm">
```
This will allow you to set the cloud-init seed location information using a
[SMBIOS value](https://en.wikipedia.org/wiki/System_Management_BIOS). To enable
this, add the following to the _bottom_ of your XML file, just before the
closing `</domain>`:
```xml
<qemu:commandline>
<qemu:arg value="-smbios"/>
<qemu:arg value="type=1,serial=ds=nocloud-net;h=mayhem;s=http://10.77.2.22:8000/mayhem/"/>
</qemu:commandline>
```
Make sure the data is actually being served on that address. Here's a nix-shell
python one-liner HTTP server:
```console
$ nix-shell -p python3 --run 'python -m http.server 8000'
```
Then you will need to either load the base image back into the zvol or recreate
the qcow2 file to reset the VM back to its default state.
Reboot the VM and wait for it to connect to your "metadata server":
```console
192.168.122.122 - - [04/Jun/2021 11:41:10] "GET /mayhem/meta-data HTTP/1.1" 200 -
192.168.122.122 - - [04/Jun/2021 11:41:10] "GET /mayhem/user-data HTTP/1.1" 200 -
```
Then you can SSH into it like normal:
```console
$ ssh xe@192.168.122.122
The authenticity of host '192.168.122.122 (192.168.122.122)' can't be established.
ED25519 key fingerprint is SHA256:eJRjDsvnVrXfntVtNVN6N+JdakaA+dvGKWWQP5OFkeA.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.122.122' (ED25519) to the list of known hosts.
__| __|_ )
_| ( / Amazon Linux 2 AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-2/
8 package(s) needed for security, out of 17 available
Run "sudo yum update" to apply all updates.
[xe@mayhem ~]$
```
[Can I choose other distros for this?](conversation://Mara/hmm)
Yep! Most distributions offer cloud-init enabled images. They may be hard to
find, but they do exist. Here's some links that will help you with common
distros:
- [Arch Linux](https://mirror.pkgbuild.com/images/) (use the `cloudimg` ones)
- [CentOS 7](https://cloud.centos.org/centos/7/images/) (use the `GenericCloud`
one)
- [CentOS 8](https://cloud.centos.org/centos/8-stream/x86_64/images/) (use the
`GenericCloud` one)
- [Debian 9](http://cloud.debian.org/images/cloud/OpenStack/9.13.22-20210531/)
(use the `openstack` one)
- [Debian 10](http://cloud.debian.org/images/cloud/buster/20210329-591/) (use
the `generic` one)
- [Debian 11](http://cloud.debian.org/images/cloud/bullseye/daily/) (use the
`generic` one)
- [Fedora 34](https://alt.fedoraproject.org/cloud/) (use the Openstack image)
- [OpenSUSE Leap
15.2](https://download.opensuse.org/repositories/Cloud:/Images:/Leap_15.2/images/)
(use the `OpenStack` image)
- [OpenSUSE Leap 15.3](https://get.opensuse.org/leap/) (use the JeOS one labeled
`OpenStack-Cloud`)
- [OpenSUSE Tumbleweed](https://download.opensuse.org/tumbleweed/appliances/)
(use the JeOS one labeled `Openstack-Cloud`)
- [Ubuntu](https://cloud-images.ubuntu.com/) (use the `server-cloudimg` image
for your version of choice)
In general, look for images that are compatible with OpenStack. OpenStack uses
cloud-init to configure virtual machines and the NoCloud data source you're using
ships by default. It usually works out, except for cases like OpenSUSE Leap
15.1. With Leap 15.1 you have to [pretend to be OpenStack a bit
more](https://github.com/tailscale/tailscale/blob/aa6abc98f30df67a0d86698b77932d4d9cc45ac0/tstest/integration/vms/opensuse_leap_15_1_test.go)
for some reason.
[What if I need to template the userdata file?](conversation://Mara/hmm)
[You really should avoid doing this if possible. Templating yaml is a delicate
process fraught with danger. The error conditions in things like Kubernetes are
that it does the wrong thing and you need to replace the service. The error
condition with this is that you lose access to your
server.](conversation://Cadey/facepalm)
[Let's say that Facts and Circumstances™ made me have to template
it.](conversation://Mara/happy)
<center>
<picture>
<source srcset="https://cdn.christine.website/file/christine-static/stickers/cadey/percussive-maintenance.avif" type="image/avif">
<source srcset="https://cdn.christine.website/file/christine-static/stickers/cadey/percussive-maintenance.webp" type="image/webp">
<img src="https://cdn.christine.website/file/christine-static/stickers/cadey/percussive-maintenance.png" alt="Cadey is percussive-maintenance">
</picture>
</center>
When you are templating yaml, you have to be really careful. It is very easy to
incur [the wrath of Norway and
Ontario](https://hitchdev.com/strictyaml/why/implicit-typing-removed/) on
accident with yaml. Here are some rules of thumb (unfortunately gained from
experience) to keep in mind:
- yaml has implicit typing, quote everything to be safe.
- ensure that every value you pass in is yaml-safe
- ensure that the indentation matches for every value
Something very important is to test the templating on a virtual machine image
that you have a back door into. Otherwise you will be locked out. You can
generally hack around it by adding `init=/bin/sh` in your kernel command line
and changing your password from there.
When you mess it up you will need to get into the VM somehow and do one of a few
things:
1. Run `cloud-init collect-logs` to generate a log tarball that you can export
to your host machine and dig into from there
2. Look through the system journal for any errors
3. Look in `/var/log` for files that begin with `cloud-init` and page through
them
If all else fails, start googling. If you are running commands against a VM with
the `runcmd` feature of cloud-init, I'd suggest going through the steps on a
manually installed virtual machine image at least once so you can be sure the
steps work. I have lost 4 hours of time to this. Also keep in mind that in the
context that `runcmd` runs from, there is no standard input hooked up. You will
need to pass `-y` everywhere.
If you want a simple Alpine Linux image to test with, look
[here](https://github.com/Xe/alpine-image) for the Alpine Linux images I test
with. You can download this image from
[here](https://xena.greedo.xeserv.us/pkg/alpine/img/alpine-edge-2021-05-18-cloud-init-within.qcow2)
in case you trust that I wouldn't put malware in that image and don't want to
make your own.
---
In the future I plan to use cloud-init _extensively_ within my [new homelab
cluster](https://twitter.com/theprincessxena/status/1400592778309115905). I have
plans to make a custom VM management service I'm calling
[waifud](https://github.com/Xe/waifud). I will write more on that as I have
written the software. I currently have a minimum viable prototype of this tool
called `mkvm` that I'm using today without any issues. I also will be writing up
how I built the cluster and installed NixOS on all the systems in a future
article.
cloud-init is an incredible achievement. It has its warts, but it being used in
so many places enables you to make configuring virtual machines so much easier.
It [even works on Windows!](https://cloudbase.it/cloudbase-init/). As much as I
complain about it in this post, life would be so much worse without it. It
allows me to use the magic of the cloud in my local virtual machines so I can
get better use out of my hardware.