the social quandry of devops

Signed-off-by: Xe Iaso <me@christine.website>
This commit is contained in:
Cadey Ratio 2022-03-14 11:14:55 +00:00
parent 10a086d6f2
commit e0fe1402cd
1 changed files with 252 additions and 0 deletions

View File

@ -0,0 +1,252 @@
---
title: Technical Solutions Poorly Solve Social Problems
date: 2022-03-18
tags:
- devops
---
[I just wanna lead this article out by saying that _I do not have all the
answers here_. I really wish I did, but I also feel that I shouldn't have to
have an answer in mind in order to raise a question. Please also keep in mind
that this is coming from someone who has been working in devops for most of
their career.](conversation://Cadey/coffee)
## Or: The Social Quandry of Devops
Technology is the cornerstone of our society. As a people we have seen the
catalytic things that technology has enabled us to do. Through technology and
new and innovative ways of applying it, we can help solve many problems. This
leads some to envision technology as a panacea, a mythical cure-all that will
make all our problems go away with the right use of it.
This does not extend to social problems. Technical fixes for social problems are
how we end up with an inadequate mess that can make the problem a lot worse than
it was before. You've almost certainly been able to see this in action with
social media (under the belief that allowing people to connect is so morally
correct that it will bring in a new age of humanity), but the example I want to
focus on is the Devops philosophy. Devops is a technical solution (creating a
new department) that helps work around social problems in workplaces
(fundamental differences in priorities and end goals), and in the process it
doesn't solve either very well.
There are a lot of skillset paths that you can end up with in tech, but the two
biggest ones are development (making the computer do new things) and systems
administration (making computers keep doing those things). There are many other
silos in the industry (technical writing, project/product management, etc.), but
the two main ones are development and systems administration. These two groups
have vastly different priorities, skillsets, needs and future goals, and as a
result of this there is very little natural cross-pollenation between the two
silos. I have seen this evolve into cultural resentment.
[Not to say that this phenomenon is exclusive to inter-department ties, I've
also seen it happen intra-department ties over choice of programming
language.](conversation://Cadey/coffee)
As far as the main differences go, development usually sees what could be. What
new things could exist and what steps you need to take to get people there. This
usually involves designing and implementing new software. The systems
administration side of things is more likely to see it as a matter of
integrating things into an existing whole, and then ensuring that whole is
reliable and proven so they don't have to worry about it constantly. This causes
a slower velocity forward and can result in extra process, slow momentum and
stagnation. These two forces naturally come into conflict because they are
vastly different things and have vastly different requirements and expectations.
Development may want to use a new version of the compiler to support a language
feature that will eliminate a lot of repetitive boilerplate. The sysadmins may
not be able to ship that compiler in production build toolstack because of
conflicting dependencies elsewhere, but they may also not want to ship that
compiler because of fears over trusting unproven software in production. This
tension builds over a long period of time and can cause problems when the
sysadmin team is chronically underfunded (due to the idea that they are
successful when nothing goes wrong, also incurring the problem of success being
a negative, which can make the sysadmin team look like a money pit when they are
actually the very thing that is making the money generator generate money). This
can also lead to avoidable burnout, unwarranted anxiety issues and unneeded
suffering on both ends of the conflict.
So given the unstoppable force of development and the immovable wall of
sysadmin, a philosophical compromise was made. This started out as many things
with many names, but as the idea rippled throughout people's heads the name
"devops" ended up sticking. Devops is a hybrid of traditional software
development and systems administration. On paper this should be great. The silos
will shrink. People will understand the limits and needs of the others.
Unfortunately though, a lot of the ideas behind devops and the overall
philosophy really do require you to radically burn down everything and start
from scratch. This tends to really not be conducive to engineering timetables
and overall system stability during the age of turbulence.
[What's the problem with burning everything down? Fire cleanses all things and
purifies away the unworthy!](conversation://Numa/delet)
[Not when you're the one being burned!](conversation://Cadey/angy)
[Wait, so what actually happens then? Does it just end up being a sysadmin team
made up out of coders?](conversation://Mara/hmm)
[Yeeeeeeeeep.](conversation://Numa/stare)
Yeah, in practice this ends up being a "new team" or a reboot of an existing
team in ways that is suddenly compelling or sexy to executives because a new
buzzword is on the scene. Realistically, devops did end up getting a proper
definition at a buzzword conference level (being able to handle development and
deployment of services), but in practice this ends up being just some random
developers that you tricked into caring about production now while also telling
them that they're better than the sysadmins. Two jobs for the price of one!
This ends up shafting the sysadmin team even harder because the new fancy devops
team has things they can talk about as positives for their quarters, so people
can more easily make a case for promotion. As a sysadmin, your "success" case is
"bad things didn't happen", which means success can't stand out on reviews.
Consider "scaled production above the rate of our customer acquistion rate"
against "set up continuous delivery to ensure velocity on our team, saving 50
hours of effort per week". Which one of those do you think gets you promoted?
Which one of those do you think gets headcount for new hires?
This has human costs too. At one of my past jobs doing more sysadmin-y things
(it was marketed as a devops hybrid role, but the "hybrid" part was more of
"frantically patch up the sinking ship with code" and not traditional software
development) and I was the pager bitch for a week at a time. Sleep is really
essential to helping you function properly at your job. During the times when I
was pager bitch, there was at least a 1/8 chance that I would be woken up in the
middle of the night to handle a problem. I had to change my pager tone 15 times
and still get goosebumps hearing those old sounds nearly a decade later. This
ended up catalyzing anxiety that I still feel today. I ended up getting addicted
to weed really bad for a few years. I admit that I'm really not the most robust
person in the world, but these things add up.
[I guess "addicted to weed" isn't totally accurate or inaccurate here, it's more
that I was addicted to the feeling of being high rather than dependence on the
drug itself. Either way, it was bad and weed was my cope. It also probably
really didn't help that I was also starting hormone replacement therapy at the
time, so I was going through second puberty at the time as well. This is the
kind of human capital cost when dealing with dysfunction like this. I've always
been kind of afraid to speak up about this.](conversation://Cadey/coffee)
- [ ] What technical problems does devops solve?
However, there are real technical problems that can only really be solved from a
devops perspective. Tools like Docker would probably never have happened in the
way they did if the devops philosophy didn't exist.
![A three panel meme with an old man talking to a child. The child says "it
works on my machine". The old man replies with "then we'll ship your machine".
The last panel says "and that is how docker was
born".](https://cdn.christine.website/file/christine-static/blog/1BDBBB94-7052-4E4C-AE32-CFEE4226CBA8.jpeg)
In a way, Docker is one of the perfect examples of the devops philosophy. It
allows developers to have their own custom versions of everything. They can use
custom compilers that the sysadmins don't have to integrate into everything.
They can experiment with new toolstacks, languages and build systems without
worrying about how they integrate into existing processes. And in the process it
defaults to things that are so hilariously unsafe that you only really realize
the problems when they own you. It makes it easy to ship around configurations
for services yes, but it doesn't make supply chain management easy at all.
[Wait, what about that? How does that make any sense?](conversation://Mara/wat)
Okay, let's consider this basic Dockerfile that builds a Go service. If you
start from very little knowledge of what's going on, you'd probably end up with
something like this:
```Dockerfile
FROM golang:1.17
WORKDIR /usr/src/app
COPY go.mod go.sum ./
RUN go mod download && go mod verify
COPY . .
RUN go build -v -o /usr/local/bin/app ./...
CMD ["app"]
```
This allows you to pin the versions of things like the Go compiler without
bothering the sysadmin team to make it available, but in the process you also
don't know what version of the compiler you are actually running. Let's say that
you have all your Docker images built with CI and that CI has an image cache set
up (as is the default in many CI systems). On your laptop you may end up getting
the latest release of Go 1.17 (at the time of writing, this is version 1.17.8),
but since CI may have seen this before and may have an old version of the `1.17`
tag cached. This would mean that despite your efforts at making things easy to
recreate, you've just accidentally put [an ASN.1 parsing
DoS](https://github.com/golang/go/issues/50165) into production, even though
your local machine will never have this issue! Not to mention if the image
you're using has a glibc bug, a DNS parsing bug or any issue with one of the
packages that makes up the image.
[So as a side effect of burning down everything and starting over you don't
actually get a lot of the advantages that the old system had in spite of the
dysfunction?](conversation://Mara/hmm)
[Yep! Realistically though you can get around this by using exact sha256 hashes
of the precise Docker image you want, however this isn't the _default_ behavior
so nobody will really know about it.](conversation://Cadey/coffee)
This is what the devops experience feels like, chaining together tools that
require careful handling to avoid accidental security flaws in ways that the
traditional sysadmin team approach fundamentally avoided by design. By
sidestepping the sysadmin team's stability and process, you learn nothing from
what they were doing.
[This is all of course assuming that at the same time as you go devops, you also
avow the grandeur of the cloud. Statistics say that these two usually go hand in
hand as the cloud is sold to executives as good for
devops.](conversation://Cadey/coffee)
As for how to get out of this mess though, I'm not sure. Like I said, this is a
_social_ problem. I am a technical solutions kind of person and as such I'm
really not the right person to ask about all this. However if you really asked
me to give an answer, I'd say that you could probably get a long way by merging
the sysadmin team and the development team. This sounds counter-intuitive at
first. Something that would make sense would be to have "embedded" sysadmins or
some kind of liason role such as "SRE", but this can only really serve to create
resentment between people.
I remember at one of my jobs where I was on such a role, I ended up also having
to be the tutor on how fundamental parts of the programming language they are
using work. This one service that was handling a lot of production load had
issues where it would just panic and die randomly when a very large customer was
trying to view a list of things that was two orders of magnitude large than
other customers that use that service. I eventually ended up figuring out where
the issue was but then I had an even harder time explaining what concurrency
does at a fundamental level and how race conditions can create undefined
behavior. I think it ended up being a 3 line fix too.
I guess the thing that would really help with this is education and helping
people hone their skills as developers. I understand that there's a bell curve
and not everyone is going to become a programming god overnight, but every
little bit sets off butterfly effects that will ripple down in other ways.
[This whole mentorship thing only really works when the company you work for
doesn't de-facto punish you for mentoring people like that. If you aren't
careful about how you frame this, doing that could make it difficult for you to
prove yourself come review time. "Helped other people do their jobs better"
doesn't really look good for a promotion committee.](conversation://Numa/delet)
[Yeah but what are you supposed to do if that kind of mentorship is what really
helps motivate you as a person and is what you really enjoy doing? I don't
really see "mentor" as a job title on any postings.](conversation://Mara/hmm)
[There's always getting tired of trying to change things from within and then
writing things out on a publicly visible blog, building up a bunch of articles
over time. Then you could use that body of work as a way to meme yourself into
hiring pipelines thanks to people sharing your links on aggegators like the
orange site. It'd probably help if you also got a reputation as a shitposter,
usually when people are able to openly joke about something that signals that
they are pretty damn experienced in it.](conversation://Numa/stare)
[You're describing this blog aren't you.](conversation://Cadey/facepalm)
Like I said though, this is hard. A lot of the problems are actually structural
problems in how companies do the science part of computer science. Structural
problems cannot be solved overnight. These things take time, effort and patience
to truly figure out and in the process you will fail to invent a light bulb many
times over. I think the middle path may end up being some combination of both
approaches with a healthy barrier around people to prevent them from being
crushed by the experiments.