forked from cadey/xesite
blog: don't look into the light (#80)
* blog: don't look into the light * is this a mistake? * it wasnt yay * space
This commit is contained in:
parent
50e22d76fc
commit
c6d7e50bb8
|
@ -0,0 +1,111 @@
|
||||||
|
---
|
||||||
|
title: "Don't Look Into the Light"
|
||||||
|
date: 2019-10-06
|
||||||
|
tags:
|
||||||
|
- practices
|
||||||
|
- big-rewrite
|
||||||
|
---
|
||||||
|
|
||||||
|
# Don’t Look Into the Light
|
||||||
|
|
||||||
|
So at a previous job I was working at, we maintained a system. This system
|
||||||
|
powered a significant part of the core of how the product was actually used (as
|
||||||
|
far as usage metrics reported). Over time, we had bolted something onto the side
|
||||||
|
of this product to take actions based on the numbers the product was tracking.
|
||||||
|
|
||||||
|
After a few years of cycling through various people, this system was very hard
|
||||||
|
to understand. Data would flow in on one end, go to an aggregation layer, then
|
||||||
|
get sent to storage and another aggregation layer, and then eventually all of
|
||||||
|
the metrics were calculated. This system was fairly expensive to operate and it
|
||||||
|
was stressing the datastores it relied on beyond what other companies called
|
||||||
|
_theoretical_ limits. Oh, to make things even more fun; the part that makes
|
||||||
|
actions based on the data was barely keeping up with what it needed to do. It
|
||||||
|
was supposed to run each of the checks once a minute and was running all of them
|
||||||
|
in 57 seconds.
|
||||||
|
|
||||||
|
During a planning meeting we started to complain about the state of the world
|
||||||
|
and how godawful everything had become. The undocumented (and probably
|
||||||
|
undocumentable) organic nature of the system had gotten out of hand. We thought
|
||||||
|
we could kill two birds with one stone and wanted to subsume another product
|
||||||
|
that took action based on data, as well as create a generic platform to
|
||||||
|
reimplement the older action-taking layer on top of.
|
||||||
|
|
||||||
|
The rules were set, the groundwork was laid. We decided:
|
||||||
|
|
||||||
|
* This would be a Big Rewrite based on all of the lessons we had learned from
|
||||||
|
the past operating the behemoth
|
||||||
|
* This project would be future-proof
|
||||||
|
* This project would have 75% test coverage as reported by CI
|
||||||
|
* This project would be built with a microservices architecture
|
||||||
|
|
||||||
|
Those of you who have been down this road before probably have massive alarm
|
||||||
|
bells going off in your head. This is one of those things that looks like a good
|
||||||
|
idea on paper, can probably be passed off as a good idea to management and
|
||||||
|
actually implemented; as happened here.
|
||||||
|
|
||||||
|
So we set off on our quest to write this software. The repo was created. CI was
|
||||||
|
configured. The scripts were optimized to dump out code coverage as output. We
|
||||||
|
strived to document everything on day 1. We took advantage of the datastore we
|
||||||
|
were using. Everything was looking great.
|
||||||
|
|
||||||
|
Then the product team came in and noticed fresh meat. They soon realized that
|
||||||
|
this could be a Big Thing to customers, and they wanted to get in on it as soon
|
||||||
|
as possible. So we suddenly had our deadlines pushed forward and needed to get
|
||||||
|
the whole thing into testing yesterday.
|
||||||
|
|
||||||
|
We set it up, set a trigger for a task, and it worked in testing. After a while
|
||||||
|
of it consistently doing that with the continuous functional testing tooling, we
|
||||||
|
told product it was okay to have a VERY LIMITED set of customers have at it.
|
||||||
|
|
||||||
|
That was a mistake. It fell apart the second customers touched it. We struggled
|
||||||
|
to understand why. We dug into the core of the beast we had just created and
|
||||||
|
managed to discover we made critical fundamental errors. The heart of the task
|
||||||
|
matching code was this monstrosity of a cross join that took the other people on
|
||||||
|
the team a few sheets of graph paper to break down and understand. The task
|
||||||
|
execution layer worked perfectly in testing, but almost never in production.
|
||||||
|
|
||||||
|
And after a week of solid debugging (including making deals with other teams,
|
||||||
|
satan, jesus and the pope to try and understand it), we had made no progress. It
|
||||||
|
was almost as if there was some kind of gremlin in the code that was just
|
||||||
|
randomly making things not fire if it wasn’t one of our internal users
|
||||||
|
triggering it.
|
||||||
|
|
||||||
|
We had to apologize with the product team. Apparently the a lot of product team
|
||||||
|
had to go on damage control as a result of this. I can only imagine the
|
||||||
|
trickled-down impact this had on other projects internal to the company.
|
||||||
|
|
||||||
|
The lesson here is threefold. First, the Big Rewrite is almost a sure-fire way
|
||||||
|
to ensure a project fails. Avoid that temptation. Don’t look into the light. It
|
||||||
|
looks nice, it may even feel nice. Statistically speaking, it’s not nice when
|
||||||
|
you get to the other side of it.
|
||||||
|
|
||||||
|
The second lesson is that making something microservices out of the gate is a
|
||||||
|
terrible idea. Microservices architectures are not planned. They are an
|
||||||
|
evolutionary result, not a fully anticipated feature.
|
||||||
|
|
||||||
|
Finally, don’t “design for the future”. The future [hasn’t happened
|
||||||
|
yet](https://christine.website/blog/all-there-is-is-now-2019-05-25). Nobody
|
||||||
|
knows how it’s going to turn out. The future is going to happen, and you can
|
||||||
|
either adapt to it as it happens in the Now or fail to. Don’t make things overly
|
||||||
|
modular, that leads to insane things like dynamically linking parts of an
|
||||||
|
application over HTTP.
|
||||||
|
|
||||||
|
> If you 'future proof' a system you build today, chances are when the future
|
||||||
|
> arrives the system will be unmaintainable or incomprehensible.
|
||||||
|
\- [John Murphy](https://twitter.com/murphybytes/status/1180131195537039360)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
This kind of advice is probably gonna feel like a slap to the face to a lot of
|
||||||
|
people. People really put their heart into their work. It feeds egos massively.
|
||||||
|
It can be very painful to have to say no to something someone is really
|
||||||
|
passionate about. It can even lead to people changing their career plans
|
||||||
|
depending on the person.
|
||||||
|
|
||||||
|
But this is the truth of the matter as far as I can tell. This is generally what
|
||||||
|
happens during the Big Rewrite centred around Best Practices for Cloud Native
|
||||||
|
software.
|
||||||
|
|
||||||
|
The most successful design decisions are wholly and utterly subjective to every
|
||||||
|
kind of project you come across. What works in system A probably won’t work
|
||||||
|
perfectly in system B. Everything is its own unique snowflake. Embrace this.
|
Loading…
Reference in New Issue