site/blog/olin-1-why-09-1-2018.markdown

11 KiB

title date
Olin: 1: Why 2018-09-01

Olin: 1: Why

Olin is an attempt at defining a radically new operating primitive to make it easier to reason about, deploy and operate event-driven services that are independent of the OS or CPU of the computer they are running on. It will have components that take care of the message queue offsetting, retry logic, parallelism and most other concerns except for your application's state layer.

Olin is designed to work top on two basic concepts: types and handlers. Types are some bit of statically defined data that has a meaning to humans. An example type could be the following:

package example;

message UserLoginEvent {
    string user_id = 1;
    string user_ip_address = 2;
    string device = 3;
    int64 timestamp_utc_unix = 4;
}

When matching data is written to the queue for the event type example.UserLoginEvent, all of the handlers registered to that data type will run with serialized protocol buffer bytes as its standard input. If the handlers return a nonzero exit status, they are retried up to three times, exponentially backing off. Handlers need to deal with the fact they can be run out of order.

Consider an Olin handler equivalent to a Unix process.

Background

Very frequently, I end up needing to write applications that basically end up waiting forever to make sure things get put in the right place and then the right code runs as a response. I then have to make sure these things get put in the right places and then that the right versions of things are running for each of the relevant services. This doesn't scale very well, not to mention is hard to secure. This leads to a lot of duplicate infrastructure over time and as things grow. Not to mention adding in tracing, metrics and log aggreation.

I would like to change this.

I would like to make a perscriptive environment kinda like Google Cloud Functions or AWS Lambda backed by a durable message queue and with handlers compiled to webassembly to ensure forward compatibility. As such, the ABI involved will be versioned, documented and tested. Multiple ABI's will eventually need to be maintained in parallel, so it might be good to get used to that early on.

You should not have to write ANY code but the bare minimum needed in order to perform your buisiness logic. You don't need to care about distributed tracing. You don't need to care about logging.

I want this project to last decades. I want the binary modules any user of Olin would upload today to be still working, untouched, in 5 years, assuming its dependencies outside of the module still work.

Since this requires a stable ABI in the long run, I would like to propose the following unstable ABI as a particularly minimal starting point to work out the ideas at play, and see how little of a surface area we can expose while still allowing for useful programs to be created and run.

Dagger

The dagger of light that renders your self-importance a decisive death

Dagger is the first ABI that will be used for interfacing with the outside world. This will be mostly for an initial spike out of the basic ideas to see what it's like while the rest of the plan is being stabilized and implemented. The core idea is that everything is a file, to the point that the file descriptor and file handle array are the only real bits of persistent state for the process. HTTP sessions, logging writers, TCP sockets, operating system files, cryptographic random readers, everything is done via filesystem system calls.

Consider this the first draft of Dagger, everything here is subject to change. This is going to be the experimental phase.

Consider Dagger at the level below libc in most Linux environements. Dagger is the kind of API that libc would be implemented on top of.

VM

Dagger processes will use WebAssembly as a platform-independent virtual machine format. WebAssembly is used here due to the large number of implemetnations and compilers targeting it for the use in web programming. We can also benefit from the amazing work that has gone into the use of WebAssembly in front-end browser programming without having to need a browser!

Base Environment

When a dagger process is opened, the following files are open:

  • 0: standard input: the semantic "input" of the program.
  • 1: standard output: the standard output of the program.
  • 2: standard error: error output for the program.

File Handlers

In the open call (defined later), a file URL is specified instead of a file name. This allows for Dagger to natively offer programs using it quick access to common services like HTTP, logging or pretty much anything else.

I'm playing with the following handlers currently:

  • http and https (Write request as http/1.1 request and sync(), Read response as http/1.1 response and close()) http://ponyapi.apps.xeserv.us/newest

I'd like to add the following handlers in the future:

  • file - filesystem files on the host OS (dangerous!) file:///etc/hostname
  • tcp - TCP connections tcp://10.0.0.39:1337
  • tcp+tls - TCP connections with TLS tcp+tls://10.0.0.39:31337
  • meta - metadata about the runtime or the event meta://host/hostname, meta://event/created_at
  • project - writers of other event types for this project (more on this, again, in future posts) project://example.UserLoginEvent
  • rand - cryptographically secure random data good for use in crypto keys rand://
  • time - unix timestamp in a little-endian encoded int64 on every read() - time://utc

In the future, users should be able to define arbitrary other protocol handlers with custom webassembly modules. More information about this feature will be posted if we choose to do this.

Handler Function

Each Dagger module can only handle one data type. This is intentional. This forces users to make a separate handler for each type of data they want to handle. The handler function reads its input from standard input and then returns 0 if whatever it needs to do "worked" (for some definition of success). Each ABI, unfortunately, will have to have its own "main" semantics. For Dagger, these semantics are used:

  • The entrypoint is exposed func handle that takes no arguments and returns an int32.
  • The input message packet is on standard input implicitly.
  • Returning 0 from func handle will mark the event as a success, returning anything else will mark it as a failure and trigger an automatic retry.

In clang in C mode, you could define the entrypoint for a handler module like this:

// handle_nothing.c

#include <dagger.h>

__attribute__ ((visibility ("default")))
int handle() {
  // read standard input as necessary and handle it
  return 0; // success
}

System Calls

A system call is how computer programs interface with the outside world. When a Dagger program makes a system call, the amount of time the program spends waiting for that system call is collected and recorded based on what underlying resource took care of the call. This means, in theory, users of olin could alert on HTTP requests from one service to another taking longer amounts of time very trivially.

Future mechanisms will allow for introspection and checking the status of handlers, as well as arbitrarily killing handlers that get stuck in a weird way.

Dagger uses the following system calls:

  • open
  • close
  • read
  • write
  • sync

Each of the system calls will be documented with their C and WebAssembly Text format type/import definitions and a short bit of prose explaining them. A future blogpost will outline the implementation of Dagger's system calls and why the choices made in its design were made.

open

extern int open(const char *furl, int flags);
(func $open (import "dagger" "open") (param i32 i32) (result i32))

This opens a file with the given file URL and flags. The flags are only relevant for some backend schemes. Most of the time, the flags argument can be set to 0.

close

extern int close(int fd);
(func $close (import "dagger" "close") (param i32) (result i32))

Close closes a file and returns if it failed or not. If this call returns nonzero, you don't know what state the world is in. Panic.

read

extern int read(int fd, void *buf, int nbyte);
(func $read (import "dagger" "read") (param i32 i32 i32) (result i32))

Read attempts to read up to count bytes from file descriptor fd into the buffer starting at buf.

write

extern int write(int fd, void *buf, int nbyte);
(func $write (import "dagger" "write") (param i32 i32 i32) (result i32))

Write writes up to count bytes from the buffer starting at buf to the file referred to by the file descriptor fd.

sync

extern int sync(int fd);
(func $sync (import "dagger" "sync") (param i32) (result i32))

This is for some backends to forcibly make async operations into sync operations. With the HTTP backend, for example, calling sync actually kicks off the dependent HTTP request.

Go ABI

Olin also includes support for running webassembly modules created by Go 1.11's webassembly support. It uses the wasmgo ABI package in order to do things. Right now this is incredibly basic, but should be extendable to more things in the future.

As an example:

// +build js,wasm ignore
// hello_world.go

package main

func main() {
	println("Hello, world!")
}

when compiled like this:

$ GOARCH=wasm GOOS=js go1.11 build -o hello_world.wasm hello_world.go

produces the following output when run with the testing shim:

=== RUN   TestWasmGo/github.com/Xe/olin/internal/abi/wasmgo.testHelloWorld
Hello, world!
--- PASS: TestWasmGo (1.66s)
    --- PASS: TestWasmGo/github.com/Xe/olin/internal/abi/wasmgo.testHelloWorld (1.66s)

Currently Go binaries cannot interface with the Dagger ABI. There is an issue open to track the solution to this.

Future posts will include more detail about using Go on top of Olin, including how support for Go's compiled webassembly modules was added to Olin.

Project Meta

To follow the project, check it on GitHub here. To talk about it on Slack, join the Go community Slack and join #olin.

Thank you for reading this post, I hope it wasn't too technical too fast, but there is a lot of base context required with this kind of technology. I will attempt to make things more detailed and clear in future posts as I come up with ways to explain this easier. Please consider this the 10,000 mile overview of a very long-term project that radically redesigns how software should be written.