15 KiB
title | date | tags | |
---|---|---|---|
Technical Solutions Poorly Solve Social Problems | 2022-03-17 |
|
Or: The Social Quandry of Devops
Technology is the cornerstone of our society. As a people we have seen the catalytic things that technology has enabled us to do. Through technology and new and innovative ways of applying it, we can help solve many problems. This leads some to envision technology as a panacea, a mythical cure-all that will make all our problems go away with the right use of it.
This does not extend to social problems. Technical fixes for social problems are how we end up with an inadequate mess that can make the problem a lot worse than it was before. You've almost certainly been able to see this in action with social media (under the belief that allowing people to connect is so morally correct that it will bring in a new age of humanity that will be objectively good for everyone). The example I want to focus on today is the Devops philosophy. Devops is a technical solution (creating a new department) that helps work around social problems in workplaces (fundamental differences in priorities and end goals), and in the process it doesn't solve either very well.
There are a lot of skillset paths that you can end up with in tech, but the two biggest ones are development (making the computer do new things) and systems administration (making computers keep doing those things). There are many other silos in the industry (technical writing, project/product management, etc.), but the two main ones are development and systems administration. These two groups have vastly different priorities, skillsets, needs and future goals, and as a result of this there is very little natural cross-pollenation between the two silos. I have seen this evolve into cultural resentment.
As far as the main differences go, development usually sees what could be. What new things could exist and what steps you need to take to get people there. This usually involves designing and implementing new software. The systems administration side of things is more likely to see it as a matter of integrating things into an existing whole, and then ensuring that whole is reliable and proven so they don't have to worry about it constantly. This causes a slower velocity forward and can result in extra process, slow momentum and stagnation. These two forces naturally come into conflict because they are vastly different things and have vastly different requirements and expectations.
Development may want to use a new version of the compiler to support a language feature that will eliminate a lot of repetitive boilerplate. The sysadmins may not be able to ship that compiler in production build toolstack because of conflicting dependencies elsewhere, but they may also not want to ship that compiler because of fears over trusting unproven software in production.
This tension builds over a long period of time and can cause problems when the sysadmin team is chronically underfunded (due to the idea that they are successful when nothing goes wrong, also incurring the problem of success being a negative, which can make the sysadmin team look like a money pit when they are actually the very thing that is making the money generator generate money). This can also lead to avoidable burnout, unwarranted anxiety issues and unneeded suffering on both ends of the conflict.
So given the unstoppable force of development and the immovable wall of sysadmin, an organizational compromise was made. This started out as many things with many names, but as the idea rippled throughout people's heads the name "devops" ended up sticking. Devops is a hybrid of traditional software development and systems administration. On paper this should be great. The silos will shrink. People will understand the limits and needs of the others. Managers will be able to have more flexible employees.
Unfortunately though, a lot of the ideas behind devops and the overall philosophy really do require you to radically burn down everything and start from scratch. This tends to really not be conducive to engineering timetables and overall system stability during the age of turbulence.
Not when you're the one being burned!
Yeah, in practice this ends up being a "new team" or a reboot of an existing team in ways that is suddenly compelling or sexy to executives because a new buzzword is on the scene. Realistically, devops did end up getting a proper definition at a buzzword conference level (being able to handle development and deployment of services from editor to production), but in practice this ends up being just some random developers that you tricked into caring about production now while also telling them that they're better than the sysadmins.
Two jobs for the price of one!
This ends up shafting the sysadmin team even harder because the new fancy devops team has things they can talk about as positives for their quarters, so people can more easily make a case for promotion. As a sysadmin, your "success" case is "bad things didn't happen", which means success can't stand out on reviews. Consider "scaled production above the rate of our customer acquistion rate" against "set up continuous delivery to ensure velocity on our team, saving 50 hours of effort per week". Which one of those do you think gets you promoted? Which one of those do you think gets headcount for new hires?
This has human costs too. At one of my past jobs doing more sysadmin-y things (it was marketed as a devops hybrid role, but the "hybrid" part was more of "frantically patch up the sinking ship with code" and not traditional software development). Sleep is really essential to helping you function properly to do your job. During the times when I was pager bitch, there was at least a 1/8 chance that I would be woken up in the middle of the night to handle a problem. I had to change my pager tone 15 times and still get goosebumps hearing those old sounds nearly a decade later. This ended up being a huge factor in my developing anxiety issues that I still feel today. I ended up getting addicted to weed really bad for a few years. I admit that I'm really not the most robust person in the world, but these things add up.
However, there are real technical problems that can only really be solved from a devops perspective. Tools like Docker would probably never have happened in the way they did if the devops philosophy didn't exist.
In a way, Docker is one of the perfect examples of the devops philosophy. It allows developers to have their own custom versions of everything. They can use custom compilers that the sysadmins don't have to integrate into everything. They can experiment with new toolstacks, languages and build systems without worrying about how they integrate into existing processes. And in the process it defaults to things that are so hilariously unsafe that you only really realize the problems when they own you. It makes it easy to ship around configurations for services yes, but it doesn't make supply chain management easy at all.
Wait, what about that? How does that make any sense?
Okay, let's consider this basic Dockerfile that builds a Go service. If you start from very little knowledge of what's going on, you'd probably end up with something like this:
FROM golang:1.17
WORKDIR /usr/src/app
COPY go.mod go.sum ./
RUN go mod download && go mod verify
COPY . .
RUN go build -v -o /usr/local/bin/app ./...
CMD ["app"]
This allows you to pin the versions of things like the Go compiler without
bothering the sysadmin team to make it available, but in the process you also
don't know what version of the compiler you are actually running. Let's say that
you have all your Docker images built with CI and that CI has an image cache set
up (as is the default in many CI systems). On your laptop you may end up getting
the latest release of Go 1.17 (at the time of writing, this is version 1.17.8),
but since CI may have seen this before and may have an old version of the 1.17
tag cached. This would mean that despite your efforts at making things easy to
recreate, you've just accidentally put an ASN.1 parsing
DoS into production, even though
your local machine will never have this issue! Not to mention if the image
you're using has a glibc bug, a DNS parsing bug or any issue with one of the
packages that makes up the image.
This is what the devops experience feels like, chaining together tools that require careful handling to avoid accidental security flaws in ways that the traditional sysadmin team approach fundamentally avoided by design. By sidestepping the sysadmin team's stability and process, you learn nothing from what they were doing.
As for how to get out of this mess though, I'm not sure. Like I said, this is a social problem that is trying to be solved through a business organizational fix. I am a technical solutions kind of person and as such I'm really not the right person to ask about all this. I don't want to propose a solution here. I've thought out several ideas, but I got nowhere with them fast.
I remember at one of my jobs where I was a devops I ended up also having to be the tutor on how fundamental parts of the programming language they are using work. This one service that was handling a lot of production load had issues where it would just panic and die randomly when a very large customer was trying to view a list of things that was two orders of magnitude larger than other customers that use that service. I eventually ended up figuring out where the issue was but then I had an even harder time explaining what concurrency does at a fundamental level and how race conditions can make things crash due to undefined behavior. I think it ended up being a 3 line fix too.
I guess the thing that would really help with this is education and helping people hone their skills as developers. I understand that there's a learning curve and not everyone is going to become a programming god overnight, but every little bit sets off butterfly effects that will ripple down in other ways. Any solution that requires everyone be a programming god isn't viable for anyone, including programming gods.
You're describing this blog aren't you.
Like I said though, this is hard. A lot of the problems are actually structural problems in how companies do the science part of computer science. Structural problems cannot be solved overnight. These things take time, effort and patience to truly figure out and in the process you will fail to invent a light bulb many times over. Devops is probably a necessary evil, but I really wish that situations weren't toxic enough in the first place to require that evil.