blog: don't look into the light (#80)
* blog: don't look into the light * is this a mistake? * it wasnt yay * space
This commit is contained in:
parent
50e22d76fc
commit
c6d7e50bb8
|
@ -0,0 +1,111 @@
|
|||
---
|
||||
title: "Don't Look Into the Light"
|
||||
date: 2019-10-06
|
||||
tags:
|
||||
- practices
|
||||
- big-rewrite
|
||||
---
|
||||
|
||||
# Don’t Look Into the Light
|
||||
|
||||
So at a previous job I was working at, we maintained a system. This system
|
||||
powered a significant part of the core of how the product was actually used (as
|
||||
far as usage metrics reported). Over time, we had bolted something onto the side
|
||||
of this product to take actions based on the numbers the product was tracking.
|
||||
|
||||
After a few years of cycling through various people, this system was very hard
|
||||
to understand. Data would flow in on one end, go to an aggregation layer, then
|
||||
get sent to storage and another aggregation layer, and then eventually all of
|
||||
the metrics were calculated. This system was fairly expensive to operate and it
|
||||
was stressing the datastores it relied on beyond what other companies called
|
||||
_theoretical_ limits. Oh, to make things even more fun; the part that makes
|
||||
actions based on the data was barely keeping up with what it needed to do. It
|
||||
was supposed to run each of the checks once a minute and was running all of them
|
||||
in 57 seconds.
|
||||
|
||||
During a planning meeting we started to complain about the state of the world
|
||||
and how godawful everything had become. The undocumented (and probably
|
||||
undocumentable) organic nature of the system had gotten out of hand. We thought
|
||||
we could kill two birds with one stone and wanted to subsume another product
|
||||
that took action based on data, as well as create a generic platform to
|
||||
reimplement the older action-taking layer on top of.
|
||||
|
||||
The rules were set, the groundwork was laid. We decided:
|
||||
|
||||
* This would be a Big Rewrite based on all of the lessons we had learned from
|
||||
the past operating the behemoth
|
||||
* This project would be future-proof
|
||||
* This project would have 75% test coverage as reported by CI
|
||||
* This project would be built with a microservices architecture
|
||||
|
||||
Those of you who have been down this road before probably have massive alarm
|
||||
bells going off in your head. This is one of those things that looks like a good
|
||||
idea on paper, can probably be passed off as a good idea to management and
|
||||
actually implemented; as happened here.
|
||||
|
||||
So we set off on our quest to write this software. The repo was created. CI was
|
||||
configured. The scripts were optimized to dump out code coverage as output. We
|
||||
strived to document everything on day 1. We took advantage of the datastore we
|
||||
were using. Everything was looking great.
|
||||
|
||||
Then the product team came in and noticed fresh meat. They soon realized that
|
||||
this could be a Big Thing to customers, and they wanted to get in on it as soon
|
||||
as possible. So we suddenly had our deadlines pushed forward and needed to get
|
||||
the whole thing into testing yesterday.
|
||||
|
||||
We set it up, set a trigger for a task, and it worked in testing. After a while
|
||||
of it consistently doing that with the continuous functional testing tooling, we
|
||||
told product it was okay to have a VERY LIMITED set of customers have at it.
|
||||
|
||||
That was a mistake. It fell apart the second customers touched it. We struggled
|
||||
to understand why. We dug into the core of the beast we had just created and
|
||||
managed to discover we made critical fundamental errors. The heart of the task
|
||||
matching code was this monstrosity of a cross join that took the other people on
|
||||
the team a few sheets of graph paper to break down and understand. The task
|
||||
execution layer worked perfectly in testing, but almost never in production.
|
||||
|
||||
And after a week of solid debugging (including making deals with other teams,
|
||||
satan, jesus and the pope to try and understand it), we had made no progress. It
|
||||
was almost as if there was some kind of gremlin in the code that was just
|
||||
randomly making things not fire if it wasn’t one of our internal users
|
||||
triggering it.
|
||||
|
||||
We had to apologize with the product team. Apparently the a lot of product team
|
||||
had to go on damage control as a result of this. I can only imagine the
|
||||
trickled-down impact this had on other projects internal to the company.
|
||||
|
||||
The lesson here is threefold. First, the Big Rewrite is almost a sure-fire way
|
||||
to ensure a project fails. Avoid that temptation. Don’t look into the light. It
|
||||
looks nice, it may even feel nice. Statistically speaking, it’s not nice when
|
||||
you get to the other side of it.
|
||||
|
||||
The second lesson is that making something microservices out of the gate is a
|
||||
terrible idea. Microservices architectures are not planned. They are an
|
||||
evolutionary result, not a fully anticipated feature.
|
||||
|
||||
Finally, don’t “design for the future”. The future [hasn’t happened
|
||||
yet](https://christine.website/blog/all-there-is-is-now-2019-05-25). Nobody
|
||||
knows how it’s going to turn out. The future is going to happen, and you can
|
||||
either adapt to it as it happens in the Now or fail to. Don’t make things overly
|
||||
modular, that leads to insane things like dynamically linking parts of an
|
||||
application over HTTP.
|
||||
|
||||
> If you 'future proof' a system you build today, chances are when the future
|
||||
> arrives the system will be unmaintainable or incomprehensible.
|
||||
\- [John Murphy](https://twitter.com/murphybytes/status/1180131195537039360)
|
||||
|
||||
---
|
||||
|
||||
This kind of advice is probably gonna feel like a slap to the face to a lot of
|
||||
people. People really put their heart into their work. It feeds egos massively.
|
||||
It can be very painful to have to say no to something someone is really
|
||||
passionate about. It can even lead to people changing their career plans
|
||||
depending on the person.
|
||||
|
||||
But this is the truth of the matter as far as I can tell. This is generally what
|
||||
happens during the Big Rewrite centred around Best Practices for Cloud Native
|
||||
software.
|
||||
|
||||
The most successful design decisions are wholly and utterly subjective to every
|
||||
kind of project you come across. What works in system A probably won’t work
|
||||
perfectly in system B. Everything is its own unique snowflake. Embrace this.
|
Loading…
Reference in New Issue