forked from cadey/xesite
talks: add systemd: the good parts (#368)
Signed-off-by: Christine Dodrill <me@christine.website>
This commit is contained in:
parent
95bfc64097
commit
062aac0903
|
@ -0,0 +1,258 @@
|
|||
---
|
||||
title: "systemd: The Good Parts"
|
||||
date: 2021-05-16
|
||||
slides_link: https://docs.google.com/presentation/d/1a0XaGu87xUcpQQVLkrnXKoKrdpN1ObiPrG9aGYVMw7k/edit?usp=sharing
|
||||
---
|
||||
|
||||
# systemd: The Good Parts
|
||||
|
||||
[Video](https://youtu.be/TJdKXq197Qk)
|
||||
|
||||
<center><iframe width="560" height="315" src="https://www.youtube.com/embed/TJdKXq197Qk" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></center>
|
||||
|
||||
The slides link will be at the end of the post.
|
||||
|
||||
Hello, I'm Xe and today I'm going to do a talk about systemd. More specifically
|
||||
the good parts of systemd. This talk is going to go fast because there's a lot
|
||||
of material to cover and the notes are going to be on my website. I have been an
|
||||
Alpine user for almost a decade and it's one of my favorite linux distributions.
|
||||
|
||||
The best things in life come with disclaimers and here are the disclaimers for this talk:
|
||||
- This talk may contain opinions. These opinions are my own and not necessarily
|
||||
the opinions of my employer.
|
||||
- This talk is not evangelism. This talk is intended to show how green the grass
|
||||
is on the other side and how Alpine can benefit from these basic ideas.
|
||||
- This talk also contains images of cartoon marine animals.
|
||||
|
||||
[What is systemd?](conversation://Mara/hmm)
|
||||
|
||||
When doing a talk about a thing I find it helps to start with a good definition
|
||||
of what that thing is. Given this talk is about systemd let's start with what
|
||||
systemd is.
|
||||
|
||||
<center>
|
||||
|
||||
![A map of systemd components](https://www.linux.com/images/stories/41373/Systemd-components.png)
|
||||
|
||||
</center>
|
||||
|
||||
systemd is a set of building blocks that you can use to make a linux system.
|
||||
This diagram covers most of the parts of systemd. There is everything from
|
||||
service management to log management to boot time analysis, network
|
||||
configuration, and user logins; but we're only going to cover a tiny fraction of
|
||||
this diagram. At a high level systemd provides a common set of tools that you
|
||||
can build a linux system with; kind of like lego bricks. It does just manage
|
||||
services but it does more than just service management.
|
||||
|
||||
Something else that's useful to ask is "why does systemd exist?" Well, looking
|
||||
back at that diagram, computers are actually fairly complicated. There's a lot
|
||||
going on over here. There's log management, there's disk management, there's
|
||||
service sequencing, network configuration, containers user sessions and most
|
||||
importantly all of these things need to happen in order or bad things can
|
||||
happen. I mentioned that systemd is more than just a service service manager
|
||||
because it has optional components that manage things like dns resolution,
|
||||
network devices, user sessions, and user level services among other things.
|
||||
|
||||
One of the big differences between systemd and other things like OpenRC is that
|
||||
systemd is a very declarative environment. In declarative environments you
|
||||
specify what you want and the system will figure out what it needs to do to get
|
||||
there. In an imperative environment you specify all of the steps you need to do
|
||||
to get there. It's the difference between writing a sql statement and a for
|
||||
loop.
|
||||
|
||||
So, pretend that this somewhat realistic scenario is happening to you: it's 4:00
|
||||
am you just got a panicked call from someone at your company that the website is
|
||||
down. You log into a server and you want to see if the website is actually down
|
||||
or if it's just dns. You probably want to know the answers to these basic
|
||||
questions:
|
||||
|
||||
- Does the service manager think your service is running?
|
||||
- How much ram is it using?
|
||||
- Does it have any child processes?
|
||||
- Has it reported it is healthy?
|
||||
- How much traffic has it used?
|
||||
- What are the last few log lines?
|
||||
- If you need to reboot the server right now for some reason, will that service
|
||||
come back up on reboot?
|
||||
|
||||
![](https://cdn.christine.website/file/christine-static/blog/Screen+Shot+2021-05-11+at+23.02.15.png)
|
||||
|
||||
systemd includes a tool called systemctl that allows you to query the status of
|
||||
services as well as start and stop them; but for right now we're going to look
|
||||
at the systemctl status subcommand. Here is the output for the systemctl status
|
||||
command for the service powering christine.website. So let's go down the list:
|
||||
|
||||
- Is the service running? If you look at the red box right there you can see
|
||||
that it says say the service has been running for nine hours.
|
||||
- How much ram is it using? If you look at the red box there it says it's using
|
||||
about 200 megs of ram.
|
||||
- How many child processes are there if you look at the red box it'll show you
|
||||
all of the processes in the service's cgroup. In this case we'll see that
|
||||
there's just one process.
|
||||
- How much network traffic has it been using? If we look here in the red box you
|
||||
can see it's had about a megabyte of traffic in and somewhat less than a
|
||||
megabyte of traffic out. My website serves everything over a unix socket and
|
||||
those numbers aren't reflected here but it's actually much higher.
|
||||
- At the bottom we can see the last few log lines. These are just random
|
||||
requests that people make to my blog.
|
||||
|
||||
[Where did it get those logs from?](conversation://Mara/hmm)
|
||||
|
||||
If you haven't seen all of this in action before you might be wondering
|
||||
something like "Wait, where did it get those logs from?"
|
||||
|
||||
I mentioned systemd does more than just start services. systemd has a common log
|
||||
sink called the journal. Logs from the kernel, network devices, services, and
|
||||
even some other system sources that you may not think are important
|
||||
automatically get put into the journal. It's similar to Windows event logs or
|
||||
the console app in macOS except it's implicit instead of explicit (Windows and
|
||||
macOS make you use some weird logging calls to make sure that log lines actually
|
||||
get in there, but systemd will capture the standard output, standard error and
|
||||
syslog for every service managed by systemd). Something neat about the journal
|
||||
is that it lets you tail the logs for the entire system with one command:
|
||||
`journalctl -f`. Here's that command running on a server of mine:
|
||||
|
||||
![journalctl output](https://cdn.christine.website/file/christine-static/blog/Screen+Shot+2021-05-15+at+11.04.17.png)
|
||||
|
||||
There's a lot more to the journal involving structured logging, automatically
|
||||
streaming the logs to places, and advanced filtering based off of different
|
||||
units, services, or other arbitrary fields; however that is out of scope for
|
||||
this talk. The important part is that it has support for that in case you
|
||||
actually need it.
|
||||
|
||||
Now this is all great, and you might be think asking yourself "well, yeah, this
|
||||
stuff is cool; but how does Alpine fit into this? Alpine can't run systemd
|
||||
because systemd is glibc specific." However we're not talking about systemd
|
||||
directly, we're talking about the philosophies involved and the truth is that
|
||||
this kind of experience is what people already have elsewhere. By not having
|
||||
something competitive Alpine is less and less attractive for newer production
|
||||
deployments.
|
||||
|
||||
Now there's at least four classes of benefits for systemd and I'm going to break
|
||||
them down into the following groups:
|
||||
- developers
|
||||
- packagers
|
||||
- system administrators
|
||||
- users
|
||||
|
||||
In general people that are developing services that run on systemd get the following benefits:
|
||||
|
||||
- Predictability. systemd configuration files are declarative rather than
|
||||
imperative. You declare units instead of imperatively building up init
|
||||
scripts. Options are declared and enforced by the service manager. This makes
|
||||
it a lot easier to review changes for correctness.
|
||||
- Portability. when setting up a service with systemd there's only one syntax to
|
||||
learn across 15 plus different distributions. This means that you don't have
|
||||
to maintain a giant pile of hacks to make the program just start consistently
|
||||
across different distributions and you can only care about the systemd unit
|
||||
that will make everything happen for you. Before systemd was widespread every
|
||||
distribution had their own unique special snowflake configuration for init
|
||||
systems and it really just wasn't that nice to deal with. Ubuntu had different
|
||||
opinions from Debian, Debian and opensuse had different opinions, and centos
|
||||
was way out in the weeds and it just became hard to do this consistently
|
||||
across distributions. Something declarative like systemd makes doing it across
|
||||
distributions a lot easier by comparison.
|
||||
- One of the other big things that it has is a api for controlling things with
|
||||
dbus. Now, say what you will about dbus but dbus does have some very rich
|
||||
introspection capabilities, as well as giving you the ability to integrate
|
||||
with system services at a level that more closely resembles what you get on
|
||||
windows or macOS (or even something like sel4 with microkernel message
|
||||
passing). You don't have to shell out to commands and pray the output format
|
||||
didn't change. You don't have to do some weird calls to unix sockets. It uses
|
||||
standard apis and allows you to integrate things more tightly with the system.
|
||||
Gnome for example uses systemd to trigger suspend and shutdown, as well as
|
||||
having a way a little gui to query the systemd journal. Server software can
|
||||
subscribe to units being started for auditing purposes and such.
|
||||
|
||||
Packagers or people that are putting software into packages get the following benefits:
|
||||
|
||||
- It is a lot easier to write a systemd unit than it is to write an OpenRC
|
||||
script. systemd units are very bland and boring, they look like ini files. It is
|
||||
going to be pretty obvious that it just does what it does and there's nothing
|
||||
special going on. And because of this declarative syntax it makes human error
|
||||
a lot more obvious and it is a lot easier for other humans to review.
|
||||
- Now, don't get me wrong, shell scripts for service definitions have gotten
|
||||
us a very long way and are likely to stay around for a very long time (I
|
||||
actually use shell scripts with most of my systemd services to do weird things
|
||||
with environment variables for configuration). However, shell scripting is a
|
||||
very, very subtle art and it is very easy to mess up and do things that are
|
||||
very unpredictable if you are not extremely careful. The declarative syntax of
|
||||
systemd removes the ability for you to mess up formatting shell scripts; or at
|
||||
the very least it isolates the flaws of the shell script to the exact service
|
||||
running and not things like the user that the service is running under.
|
||||
|
||||
system administrators of systemd systems also get the following benefits:
|
||||
|
||||
- systemctl status and a lot of other parts of systemctl let you see what the
|
||||
system or an individual service is doing without having to wonder if it's
|
||||
actually working or not. In general the lazy the lazy thing is the thing that
|
||||
you want to optimize for because people are distracted. There is a lot going
|
||||
on sometimes and if you optimize it so that the easiest thing to do is the
|
||||
correct thing then it is a lot easier to deal with when you have a distracted
|
||||
operator. systemd is set up so that it's hard to do the wrong thing. It is
|
||||
hard to have logs go anywhere about the system journal. It is hard to write a
|
||||
unit that doesn't tell you if the service is actually running or not. And it
|
||||
makes it so that the path of least resistance will do most of what you want.
|
||||
- Sometimes system administrators have opinions that are different than the
|
||||
opinions of the packager. Sometimes you need to change environment variables
|
||||
for http proxies or something and sometimes you believe the packager has
|
||||
different opinions than you do about how something should be run. In OpenRC
|
||||
you'd have to make a copy of the init script, make your changes, and then hope
|
||||
those changes don't get blown away when the package updates. systemd has a
|
||||
first-class mechanism for doing this called drop-in units that allow you to
|
||||
customize parts of a systemd service so that you can override exactly what you
|
||||
need to (and only that) and systemd will turn the all of those into one big
|
||||
logical unit and actually go off and run that. This has been very useful in
|
||||
practice.
|
||||
- Another thing that is kind of endemic to sysvinit and OpenRC systems is the
|
||||
fact that unless you are careful and configure it right cron job output will
|
||||
just go to nowhere and there is not really an easy way to figure out if a cron
|
||||
job actually ran and if it errored or if it did exactly what you wanted. If I
|
||||
recall there was actually an entire small startup that was formed around just
|
||||
alerting for cron jobs that were not doing what they should be doing. systemd
|
||||
changes this systemd because all of the logs are in the journal. If you set up
|
||||
a systemd timer (which is the systemd land equivalent to a cron job) all of
|
||||
the output for the service associated with that timer gets put into the
|
||||
journal and you can see exactly what went wrong so you can go off and fix it.
|
||||
This has saved me so much time and headache trying to do this stuff manually.
|
||||
- Another thing that you can do is you can group services together with targets
|
||||
which are kind of like named runlevels. Targets let you specify the difference
|
||||
between the system booting the network stack is configured and all of the
|
||||
services needed for your app are running. You can get a list of dependencies
|
||||
from systemd for any service and you can also use that to help you plan
|
||||
incident response, so it is more difficult to have hidden dependencies.
|
||||
|
||||
As far as users go:
|
||||
- systemd is not limited to just managing system level services systemd can also
|
||||
manage user services with systemd user mode. I use this on my Linux system in
|
||||
order to have a couple services running in the background querying for weather
|
||||
or a couple other api calls to put them into my status bar on my tiling window
|
||||
manager (sway). I have another one that runs emacs in server mode so that I can
|
||||
have one giant emacs session that will automatically start on login. I can put
|
||||
hundreds and hundreds of buffers in there and not have to worry about it. I can
|
||||
spawn new emacs frames instantly, it's really beautiful.
|
||||
- You can also query all of the system journal logs as a normal user and you
|
||||
don't have to sudo up and go into the logs folder. So if you just want to take
|
||||
a quick look at something, you don't have to type in your password or hit a
|
||||
yubikey press or whatever you have configured.
|
||||
|
||||
[I hope Alpine ends up with something similar!](conversation://Mara/happy)
|
||||
|
||||
I really hope alpine comes up with something similar to systemd. Alpine can
|
||||
really benefit from a tightly integrated service manager that does at least some
|
||||
of the things that systemd does. Declarative really is better than imperative
|
||||
because declarative is easier for distracted operators.
|
||||
|
||||
People get distracted. It happens, and when distracted people do things it can
|
||||
sometimes have bad consequences. So if we make the tools powerful, but
|
||||
implicitly correct, then it will just be a lot better overall and users will
|
||||
have a lot less worry involved.
|
||||
|
||||
On that note we are very close to hitting time so here's my shout outs to people
|
||||
who either help make this talk happen or I think are cool.
|
||||
|
||||
if you have any questions please feel free to ping me on twitter, in the irc
|
||||
room, or on the compact page on my website. I enjoy these kinds of questions and
|
||||
I openly welcome you to ask them.
|
||||
|
||||
Thank you, have a good day.
|
Loading…
Reference in New Issue