site/blog/olin-1-why-09-1-2018.markdown

291 lines
11 KiB
Markdown

---
title: "Olin: 1: Why"
date: 2018-09-01
---
# [Olin][olin]: 1: Why
[Olin][olin] is an attempt at defining a radically new operating primitive to make it
easier to reason about, deploy and operate event-driven services that are
independent of the OS or CPU of the computer they are running on. It will have
components that take care of the message queue offsetting, retry logic,
parallelism and most other concerns except for your application's state layer.
Olin is designed to work top on two basic concepts: types and handlers. Types
are some bit of statically defined data that has a meaning to humans. An example
type could be the following:
```
package example;
message UserLoginEvent {
string user_id = 1;
string user_ip_address = 2;
string device = 3;
int64 timestamp_utc_unix = 4;
}
```
When matching data is written to the queue for the event type `example.UserLoginEvent`,
all of the handlers registered to that data type will run with serialized protocol
buffer bytes as its standard input. If the handlers return a nonzero exit status,
they are retried up to three times, exponentially backing off.
Handlers need to deal with the fact they can be run out of order, and that multiple
instances of them will can be running on physcially different servers in parallel.
If a handler starts doing something and fails, it should back out any previously
changed values using transactions or equivalent.
Consider an Olin handler equivalent to a Unix process.
## Background
Very frequently, I end up needing to write applications that basically end up
waiting forever to make sure things get put in the right place and then the
right code runs as a response. I then have to make sure these things get put
in the right places and then that the right versions of things are running for
each of the relevant services. This doesn't scale very well, not to mention is
hard to secure. This leads to a lot of duplicate infrastructure over time and
as things grow. Not to mention adding in tracing, metrics and log aggreation.
I would like to change this.
I would like to make a perscriptive environment kinda like [Google Cloud Functions][gcf]
or [AWS Lambda][lambda] backed by a durable message queue and with handlers
compiled to webassembly to ensure forward compatibility. As such, the ABI
involved will be versioned, documented and tested. Multiple ABI's will eventually
need to be maintained in parallel, so it might be good to get used to that early
on.
You should not have to write ANY code but the bare minimum needed in order to
perform your buisiness logic. You don't need to care about distributed tracing.
You don't need to care about logging.
I want this project to last decades. I want the binary modules any user of Olin
would upload today to be still working, untouched, in 5 years, assuming its
dependencies outside of the module still work.
Since this requires a stable ABI in the long run, I would like to propose the
following _unstable_ ABI as a particularly minimal starting point to work out
the ideas at play, and see how little of a surface area we can expose while
still allowing for useful programs to be created and run.
## Dagger
> The dagger of light that renders your self-importance a decisive death
Dagger is the first ABI that will be used for interfacing with the outside world.
This will be mostly for an initial spike out of the basic ideas to see what it's
like while the rest of the plan is being stabilized and implemented.
The core idea is that everything is a file, to the point that the file descriptor
and file handle array are the only real bits of persistent state for the process.
HTTP sessions, logging writers, TCP sockets, operating system files, cryptographic
random readers, everything is done via filesystem system calls.
Consider this the first draft of Dagger, everything here is subject to change.
This is going to be the experimental phase.
Consider Dagger at the level _below_ libc in most Linux environements. Dagger
is the kind of API that libc would be implemented on top of.
### VM
Dagger processes will use [WebAssembly][wasm] as a platform-independent virtual
machine format. WebAssembly is used here due to the large number of
implementations and compilers targeting it for the use in web programming. We can
also benefit from the amazing work that has gone into the use of WebAssembly in
front-end browser programming without having to need a browser!
### Base Environment
When a dagger process is opened, the following files are open:
- 0: standard input: the semantic "input" of the program.
- 1: standard output: the standard output of the program.
- 2: standard error: error output for the program.
### File Handlers
In the open call (defined later), a file URL is specified instead of a file name.
This allows for Dagger to natively offer programs using it quick access to common
services like HTTP, logging or pretty much anything else.
I'm playing with the following handlers currently:
- http and https (Write request as http/1.1 request and sync(), Read response as http/1.1 response and close()) `http://ponyapi.apps.xeserv.us/newest`
I'd like to add the following handlers in the future:
- file - filesystem files on the host OS (dangerous!) `file:///etc/hostname`
- tcp - TCP connections `tcp://10.0.0.39:1337`
- tcp+tls - TCP connections with TLS `tcp+tls://10.0.0.39:31337`
- meta - metadata about the runtime or the event `meta://host/hostname`, `meta://event/created_at`
- project - writers of other event types for this project (more on this, again, in future posts) `project://example.UserLoginEvent`
- rand - cryptographically secure random data good for use in crypto keys `rand://`
- time - unix timestamp in a little-endian encoded int64 on every read() - `time://utc`
In the future, users should be able to define arbitrary other protocol handlers
with custom webassembly modules. More information about this feature will be
posted if we choose to do this.
### Handler Function
Each Dagger module can only handle one data type. This is intentional. This
forces users to make a separate handler for each type of data they want to
handle. The handler function reads its input from standard input and then
returns `0` if whatever it needs to do "worked" (for some definition of success).
Each ABI, unfortunately, will have to have its own "main" semantics. For Dagger,
these semantics are used:
- The entrypoint is exposed func `handle` that takes no arguments and returns an int32.
- The input message packet is on standard input implicitly.
- Returning 0 from func `handle` will mark the event as a success, returning anything else will mark it as a failure and trigger an automatic retry.
In clang in C mode, you could define the entrypoint for a handler module like this:
```c
// handle_nothing.c
#include <dagger.h>
__attribute__ ((visibility ("default")))
int handle() {
// read standard input as necessary and handle it
return 0; // success
}
```
### System Calls
A [system call][syscall] is how computer programs interface with the outside
world. When a Dagger program makes a system call, the amount of time the program
spends waiting for that system call is collected and recorded based on what
underlying resource took care of the call. This means, in theory, users of olin
could alert on HTTP requests from one service to another taking longer amounts
of time very trivially.
Future mechanisms will allow for introspection and checking the status of
handlers, as well as arbitrarily killing handlers that get stuck in a weird way.
Dagger uses the following system calls:
- open
- close
- read
- write
- sync
Each of the system calls will be documented with their C and WebAssembly Text format
type/import definitions and a short bit of prose explaining them. A future
blogpost will outline the implementation of Dagger's system calls and why the
choices made in its design were made.
#### open
```c
extern int open(const char *furl, int flags);
(func $open (import "dagger" "open") (param i32 i32) (result i32))
```
This opens a file with the given file URL and flags. The flags are only relevant
for some backend schemes. Most of the time, the flags argument can be set to `0`.
#### close
```c
extern int close(int fd);
(func $close (import "dagger" "close") (param i32) (result i32))
```
Close closes a file and returns if it failed or not. If this call returns nonzero,
you don't know what state the world is in. Panic.
#### read
```c
extern int read(int fd, void *buf, int nbyte);
(func $read (import "dagger" "read") (param i32 i32 i32) (result i32))
```
Read attempts to read up to count bytes from file descriptor fd into the buffer
starting at buf.
#### write
```c
extern int write(int fd, void *buf, int nbyte);
(func $write (import "dagger" "write") (param i32 i32 i32) (result i32))
```
Write writes up to count bytes from the buffer starting at buf to the file
referred to by the file descriptor fd.
#### sync
```c
extern int sync(int fd);
(func $sync (import "dagger" "sync") (param i32) (result i32))
```
This is for some backends to forcibly make async operations into sync operations.
With the HTTP backend, for example, calling sync actually kicks off the
dependent HTTP request.
## Go ABI
Olin also includes support for running webassembly modules created by [Go 1.11's webassembly support](https://golang.org/wiki/WebAssembly).
It uses [the `wasmgo` ABI][wasmgo] package in order to do things. Right now
this is incredibly basic, but should be extendable to more things in the future.
As an example:
```go
// +build js,wasm ignore
// hello_world.go
package main
func main() {
println("Hello, world!")
}
```
when compiled like this:
```console
$ GOARCH=wasm GOOS=js go1.11 build -o hello_world.wasm hello_world.go
```
produces the following output when run with the testing shim:
```
=== RUN TestWasmGo/github.com/Xe/olin/internal/abi/wasmgo.testHelloWorld
Hello, world!
--- PASS: TestWasmGo (1.66s)
--- PASS: TestWasmGo/github.com/Xe/olin/internal/abi/wasmgo.testHelloWorld (1.66s)
```
Currently Go binaries cannot interface with the Dagger ABI. There is [an issue](https://github.com/Xe/olin/issues/5)
open to track the solution to this.
Future posts will include more detail about using Go on top of Olin, including
how support for Go's compiled webassembly modules was added to Olin.
## Project Meta
To follow the project, check it on GitHub [here][olin]. To talk about it on Slack,
join the [Go community Slack][goslack] and join `#olin`.
Thank you for reading this post, I hope it wasn't too technical too fast, but
there is a lot of base context required with this kind of technology. I will
attempt to make things more detailed and clear in future posts as I come up with
ways to explain this easier. Please consider this the 10,000 mile overview of
a very long-term project that radically redesigns how software should be written.
[gcf]: https://cloud.google.com/functions/
[lambda]: https://aws.amazon.com/lambda/
[syscall]: https://en.wikipedia.org/wiki/System_call
[olin]: https://github.com/Xe/olin
[goslack]: https://invite.slack.golangbridge.org
[wasmgo]: https://github.com/Xe/olin/tree/master/internal/abi/wasmgo
[wasm]: https://webassembly.org