blog/lokahi: fix errors i thought i fixed

This commit is contained in:
Cadey Ratio 2018-02-08 22:56:58 -08:00
parent ce6f0462f8
commit 74ad1ae964
1 changed files with 14 additions and 22 deletions

View File

@ -6,21 +6,13 @@ github_issue: https://github.com/Xe/lokahi/issues/15
# Introducing Lokahi # Introducing Lokahi
This week at Heroku, there was a hackweek. I decided to tackle a few problems at
once and this is the result. The two big things I wanted to tackle were building
a scalable HTTP health checking service and unlocking the "flow" state of
consciousness to make developing, understanding and improving this project a lot
easier.
## lokahi
Lokahi is a http service uptime checking and notification service. Currently Lokahi is a http service uptime checking and notification service. Currently
lokahi does very little. Given a URL and a webhook URL, lokahi runs checks every lokahi does very little. Given a URL and a webhook URL, lokahi runs checks every
minute on that URL and ensures it's up. If the URL goes down or the health minute on that URL and ensures it's up. If the URL goes down or the health
workers have trouble getting to the URL, the service is flagged as down and a workers have trouble getting to the URL, the service is flagged as down and a
webhook is sent out. webhook is sent out.
### Stack ## Stack
| What | Role | | What | Role |
| :-------- | :------------ | | :-------- | :------------ |
@ -31,13 +23,13 @@ webhook is sent out.
| Nats | Message queue | | Nats | Message queue |
| Cobra | CLI | | Cobra | CLI |
### Components ## Components
Interrelation graph: Interrelation graph:
![interrelation graph of lokahi components, see /static/img/lokahi.dot for the graphviz]("/static/img/lokahi.png") ![interrelation graph of lokahi components, see /static/img/lokahi.dot for the graphviz]("/static/img/lokahi.png")
#### lokahictl ### lokahictl
The command line interface, currently outputs everything in JSON. It currently The command line interface, currently outputs everything in JSON. It currently
has a few options: has a few options:
@ -69,7 +61,7 @@ Use "lokahictl [command] --help" for more information about a command.
Each of these subcommands has help and most of them have additional flags. Each of these subcommands has help and most of them have additional flags.
#### lokahid ### lokahid
This is the main API server. It exposes twirp services defined in [`xe.github.lokahi`](https://github.com/Xe/lokahi/blob/master/rpc/lokahi/lokahi.proto) This is the main API server. It exposes twirp services defined in [`xe.github.lokahi`](https://github.com/Xe/lokahi/blob/master/rpc/lokahi/lokahi.proto)
and [`xe.github.lokahi.admin`](https://github.com/Xe/lokahi/blob/master/rpc/lokahiadmin/lokahiadmin.proto). and [`xe.github.lokahi.admin`](https://github.com/Xe/lokahi/blob/master/rpc/lokahiadmin/lokahiadmin.proto).
@ -93,19 +85,19 @@ PORT=9001
Every minute, lokahid will scan for every check that is set to run minutely and Every minute, lokahid will scan for every check that is set to run minutely and
run them. Running checks any time but minutely is currently unsupported. run them. Running checks any time but minutely is currently unsupported.
#### healthworker ### healthworker
healthworker listens on nats queue `check.run` and returns health information healthworker listens on nats queue `check.run` and returns health information
about that service. about that service.
#### webhookworker ### webhookworker
webhookworker listens on nats queue `webhook.egress` and sends webhooks based on webhookworker listens on nats queue `webhook.egress` and sends webhooks based on
the input it's given. the input it's given.
### Challenges Faced During Development ## Challenges Faced During Development
#### ORM Issues ### ORM Issues
Initially, I implemented this using [gorm](https://github.com/jinzhu/gorm) and Initially, I implemented this using [gorm](https://github.com/jinzhu/gorm) and
started to run into a lot of problems when using it in anything but small started to run into a lot of problems when using it in anything but small
@ -117,7 +109,7 @@ I rewrote this to use [`database/sql`](https://godoc.org/database/sql) and
[`sqlx`](https://godoc.org/github.com/jmoiron/sqlx) and all of the tests passed [`sqlx`](https://godoc.org/github.com/jmoiron/sqlx) and all of the tests passed
the first time I tried to run this, no joke. the first time I tried to run this, no joke.
#### Scaling to 50,000 Checks ### Scaling to 50,000 Checks
This one was actually a lot harder than I thought it would be, and not for the This one was actually a lot harder than I thought it would be, and not for the
reasons I thought it would be. One of the main things that I discovered when reasons I thought it would be. One of the main things that I discovered when
@ -134,7 +126,7 @@ This service can handle 50,000 HTTP checks in a minute. The only part that gets
backed up currently is webhook egress, but that is likely fixable with further backed up currently is webhook egress, but that is likely fixable with further
optimization on the HTTP checking and webhook egress paths. optimization on the HTTP checking and webhook egress paths.
### Basic Usage ## Basic Usage
To set up an instance of lokahi on a machine with [Docker Compose](https://docs.docker.com/compose/) To set up an instance of lokahi on a machine with [Docker Compose](https://docs.docker.com/compose/)
installed, create a docker compose manifest with the following in it: installed, create a docker compose manifest with the following in it:
@ -222,7 +214,7 @@ services:
Start this with `docker-compose up -d`. Start this with `docker-compose up -d`.
#### Configuration ### Configuration
Open `~/.lokahictl.hcl` and enter in the following: Open `~/.lokahictl.hcl` and enter in the following:
@ -232,7 +224,7 @@ server = "http://AzureDiamond:hunter2@127.0.0.1:24253"
Save this and then lokahictl is now configured to work with the local copy of lokahi. Save this and then lokahictl is now configured to work with the local copy of lokahi.
#### Creating a check ### Creating a check
To create a check against duke reporting to samplehook: To create a check against duke reporting to samplehook:
@ -260,7 +252,7 @@ $ docker-compose -f samplehook
playbook url: https://github.com/Xe/lokahi/wiki/duke-of-york-Playbook playbook url: https://github.com/Xe/lokahi/wiki/duke-of-york-Playbook
``` ```
### Webhooks ## Webhooks
Webhooks get a HTTP POST of a protobuf-encoded [`xe.github.lokahi.CheckStatus`](https://github.com/Xe/lokahi/blob/13bc98ff0665ab13044f08d51ed2141ca0c38647/rpc/lokahi/lokahi.proto#L83) Webhooks get a HTTP POST of a protobuf-encoded [`xe.github.lokahi.CheckStatus`](https://github.com/Xe/lokahi/blob/13bc98ff0665ab13044f08d51ed2141ca0c38647/rpc/lokahi/lokahi.proto#L83)
with the following additional HTTP headers: with the following additional HTTP headers:
@ -280,7 +272,7 @@ receivers.
JSON webhook support is not currently implemented, but is being tracked at JSON webhook support is not currently implemented, but is being tracked at
[this github issue](https://github.com/Xe/lokahi/issues/4). [this github issue](https://github.com/Xe/lokahi/issues/4).
### Call for Contributions ## Call for Contributions
Lokahi is pretty great as it is, but to be even better lokahi needs a bunch Lokahi is pretty great as it is, but to be even better lokahi needs a bunch
of work, experience reports and people willing to contribute to the project. of work, experience reports and people willing to contribute to the project.