xesite/blog/olin-2-the-future-09-5-2018...

448 lines
14 KiB
Markdown

---
title: "Olin: 2: The Future"
date: 2018-09-05
series: olin
---
This post is a continuation of [this post](https://christine.website/blog/olin-1-why-09-1-2018).
Suppose you are given the chance to throw out the world and start from scratch
in a minimal environment. You can then work up from nothing and build the world
from there.
How would you do this?
One of the most common ways is to pick a model that they are Stockholmed into
after years of badness and then replicate it, with all of the flaws of the model
along with it. Dagger is a direct example of this. I had been stockholmed into
thinking that everything was a file stream and replicated Dagger's design based
on it. There was a really [brilliant](https://write.as/excerpts/conversation-with-_wmd-on-hacker-news)
Hacker News comment that inspired a bit of a rabbit hole internally, and I think
we have settled on an idea for a primitive that would be easy to implement and
use from multiple languages.
So, let's stop and ask ourselves a question that is going to sound really simple
or basic, but really will define a lot of what we do here.
What do we want to do with a computer that could be exposed to a WebAssembly
module? What are the basic operations that we can expose that would be primitive
enough to be universally useful but also simple to understand from an implementation
standpoint from multiple languages?
Well, what are the programs actually doing with the interfaces? How can we use
that normal semantic behavior and provide a more useful primitive?
## The Parable of the Poison Arrow
When designing things such as these, it is very easy to get lost in the
philosophical weeds. I mean, we are getting the chance to redefine the basic
things that we will get angry at. There's a lot of pain and passion that goes
into our work and it shows.
As such, consider the following Buddhist parable:
> It's just as if a man were wounded with an arrow thickly smeared with poison.
>
> His friends & companions, kinsmen & relatives would provide him with a surgeon, and the man would say, 'I won't have this arrow removed until I know whether the man who wounded me was a noble warrior, a priest, a merchant, or a worker.'
>
> He would say, 'I won't have this arrow removed until I know whether the shaft with which I was wounded was that of a common arrow, a curved arrow, a barbed, a calf-toothed, or an oleander arrow.'
>
> The man would die and those things would still remain unknown to him.
[Source](https://en.wikipedia.org/wiki/Parable_of_the_Poisoned_Arrow)
At some point, we are going to have to just try something and see what it is
like. Let's not get lost too deep into what the bowstring of the person who shot
us with the poison arrow is made out of and focus more on the task at hand right
now, designing the ground floor.
## Core Operations
Let's try a new primitive. Let's call this primitive the interface. An interface
is a collection of types and methods that allows a WebAssembly module to perform
some action that it otherwise would be unable to do. As such, the only functions
we really need are a `require` function to introduce the dependency into the
environment, a `close` function to remove dependencies from the environment, and
an `invoke` function to call methods of the dependent interfaces. These can be
expressed in the following C-style types:
```c
// require loads the dependency by package into the environment. The int64 value
// returned by this function is effectively random and should be treated as
// opaque.
//
// If this returns less than zero, the value times negative 1 is the error code.
//
// Anything created by this function is to be considered initialized but
// unconfigured.
extern int64 require(const char* package);
// close removes a given dependency from the environment. If this returns less
// than zero, the value times negative 1 is the error code.
extern int64 close(int64 handle);
// invoke calls the given method with an input and output structure. This allows
// the protocol buffer generators to more easily build the world for us.
//
// The resulting int64 value is zero if everything succeeded, otherwise it is the
// error code (if any) times negative 1.
//
// The in and out pointers must be to a C-like representation of the protocol
// buffer definition of the interface method argument. If this ends up being an
// issue, I guess there's gonna be some kinda hacky reader thing involved. No
// biggie though, that can be codegenned.
extern int64 invoke(int64 handle, int64 method, void* in, void* out);
```
(Yes, I know I made a lot of fuss about not just blindly following the design
decisions of the past and then just suggested returning a negative value from a
function to indicate the presence of an error. I just don't know of a better and
more portable mechanism for errors yet. If you have one, please suggest it to me.)
You may have noticed that the `invoke` function takes void pointers. This is
intentional. This will require additional code generation on the server side to
support copying the values out of WebAssembly memory. This may serve to be
completely problematic, but I bet we can at least get Rust working with this.
Using these basic primitives, we can actually model way more than you think would
be possible. Let's do a simple example.
## Example: Logging
Consider logging. It is usually implemented as a stream of logging messages containing
unstructured text that usually only has meaning to the development team and the
regular expressions that trigger the pager. Knowing this, we can expose a logging
interface like this:
```proto
syntax = "proto3";
package us.xeserv.olin.dagger.logging.v1;
option go_package = "logging";
// Writer is a log message writer. This is append-only. All text in log messages
// may be read by scripts and humans.
service Writer {
// method 0
rpc Log(LogMessage) returns (Nil) {};
}
// When nothing remains, everything is equally possible.
// TODO(Xe): standardize this somehow.
message Nil {}
// LogMessage is an individual log message. This will get added to as it gets
// propagated up through the layers of the program and out into the world, but
// those don't matter right now.
message LogMessage {
bytes message = 1;
}
```
And at a low level, this would be used like this:
```c
extern int64 require(const char* package);
extern int64 close(int64 handle);
extern int64 invoke(int64 handle, int64 method, void* in, void* out);
// This exposes logging_LogMessage, logging_Nil,
// int64 logging_Log(int64 handle, void* in, void* out)
// assume this is magically generated from the protobuf file above.
#include <services/us.xeserv.olin.dagger.logging.v1.h>
int64 main() {
int64 logHdl = require("us.xeserv.olin.dagger.logging.v1");
logging_LogMessage msg;
logging_Nil none;
msg.message = "Hello, world!";
// The following two calls are equivalent:
assert(logging_Log(logHdl, &msg, &none));
assert(invoke(logHdl, logging_Writer_method_Log, &msg, &none));
assert(close(logHdl));
}
```
This is really great to codegen, audit, validate, and not to mention we can easily
verify what logging interface the user actually wants from which vendor. This
allows people who install Olin to their own cluster to potentially define their
own custom interfaces. This actually gives us the chance to make this a primitive.
Some problems that probably are going to come up pretty quickly is that every
language under the sun has their own idea of how to arrange memory. This may make
directly scraping the values out of ram inviable in the future.
If reading values out of memory does become inviable, I suggest the following
changes:
```c
extern int64 require(const char* package);
extern int64 close(int64 handle);
extern int64 invoke(int64 handle, int64 method, char* in, int32 inlen, char* out int32 outlen);
```
(I don't know how to describe "pointer to bytes" in C, so I am using a C string
here to fill in that gap.)
In this case, the arguments to `invoke()` would be pointers to protocol
buffer-encoded ram. This may prove to be a huge burden in terms of deserializing
and serializing the protocol buffers over and over every time a syscall has to
be made, but it may actually be enough of a performance penalty that it prevents
spurious syscalls, given the "cost" of them. Code generators should remove most
of the pain when it comes to actually using this interface though, the
automatically generated code should automatically coax things into protocol
buffers without user interaction.
For fun, let's take this basic model and then map Dagger's concept of file I/O to
it:
```proto
syntax = "proto3";
package us.xeserv.olin.dagger.files.v1;
option go_package = "files";
// When nothing remains, everything is equally possible.
// TODO(Xe): standardize this somehow.
message Nil {}
service Files {
rpc Open(OpenRequest) returns (FID) {};
rpc Read(ReadRequest) returns (ReadResponse) {};
rpc Write(WriteRequest) returns (N) {};
rpc Close(FID) returns (Nil) {};
rpc Sync(FID) returns (Nil) {};
}
message FID {
int64 opaque_id;
}
message OpenRequest {
string identifier = 1;
int64 flags = 2;
}
message N {
int64 count
}
message ReadRequest {
FID fid = 1;
int64 max_length = 2;
}
message ReadResponse {
bytes data = 1;
N n = 2;
}
message WriteRequest {
FID fid = 1;
bytes data = 2;
}
```
Using these methods, we can rebuild (most of) the original API:
```c
extern int64 require(const char* package);
extern int64 close(int64 handle);
extern int64 invoke(int64 handle, int64 method, void* in, void* out);
#include <services/us.xeserv.olin.dagger.files.v1.h>
int64 filesystem_service_id;
void setup_filesystem() {
filesystem_service_id = require("us.xeserv.olin.dagger.files")
}
int64 open(char *furl, int64 flags) {
files_OpenRequest req;
files_FID resp;
int64 err;
req.identifier = char*(furl);
req.flags = flags;
// could also be err = file_Files_Open(filesystem_service_id, &req, &resp);
err = invoke(filesystem_service_id, files_Files_method_Open, &req, &resp);
if (err != 0) {
return err;
}
return resp.opaque_id;
}
int64 d_close(int64 fd) {
files_FID req;
files_Nil resp;
int64 err;
req.opaque_id = fd;
err = invoke(filesystem_service_id, files_Files_method_Close, &req, &resp);
if (err != 0) {
return err;
}
return 0;
}
int64 read(int64 fd, void* buf, int64 nbyte) {
files_FID fid;
files_ReadRequest req;
files_ReadResponse resp;
int64 err;
int i;
fid.opaque_id = fd;
req.fid = fid;
req.max_length = nbyte;
err = invoke(filesystem_service_id, file_Files_method_Read, &req, &resp);
if (err != 0) {
return err;
}
// TODO(Xe): replace with memcpy once we have libc or something
for (i = 0; i < resp.n.count; i++) {
buf[i] = resp.data[i]
}
return 0;
}
int64 write(int64 fd, void* buf, int64 nbyte) {
files_FID fid;
files_WriteRequest req;
files_N resp;
int64 err;
fid.opaque_id = fd;
req.fid = fid;
req.data = buf; // let's pretend this works, okay?
err = invoke(filesystem_service_id, files_Files_method_Write, &req, &resp);
if (err != 0) {
return err;
}
return resp.count;
}
int64 sync(int64 fd) {
files_FID req;
files_Nil resp;
int64 err;
req.opaque_id = fd;
err = invoke(filesystem_service_id, files_Files_method_Sync, &req, &resp);
if (err != 0) {
return err;
}
return 0;
}
```
And with that we should have the same interface as Dagger's, save the fact that
the name `close` is now shadowed by the global close function. On the server side
we could implement this like so:
```go
package files
import (
"context"
"errors"
"math/rand"
"github.com/Xe/olin/internal/abi/dagger"
)
func init() {
rand.Seed(time.Now().UnixNano())
}
type FilesImpl struct {
*dagger.Process
}
func (FilesImpl) getRandomNumber() int64 {
return rand.Int63()
}
func daggerError(respValue int64, err error) error {
if err == nil {
err = errors.New("")
}
return dagger.Error{Errno: dagger.Errno(respValue * -1), Underlying: err}
}
func (fs *FilesImpl) Open(ctx context.Context, op *OpenRequest) (*FID, error) {
fd := fs.Process.OpenFD(op.Identifier, uint32(op.Flags))
if fd < 0 {
return nil, daggerError(fd, nil)
return &FID{OpaqueId: fd}, nil
}
func (fs *FilesImpl) Read(ctx context.Context, rr *ReadRequest) (*ReadResponse, error) {
fd := rr.Fid.OpaqueId
data := make([]byte, rr.MaxLength)
n := fs.Process.ReadFD(fd, data)
if n < 0 {
return nil, daggerError(n, nil)
}
result := &ReadResponse{
Data: data,
N: N{
Count: n
},
}
return result, nil
}
func (fs *FilesImpl) Write(ctx context.Context, wr *WriteRequest) (*N, error) {
fd := wr.Fid.OpaqueId
n := fs.Process.WriteFD(fd, wr.Data)
if n < 0 {
return nil, daggerError(n, nil)
}
return &N{Count: n}, nil
}
func (fs *FilesImpl) Close(ctx context.Context, fid *Fid) (*Nil, error) {
return &Nil{}, daggerError(fs.Process.CloseFD(fid.OpaqueId), nil)
}
func (fs *FilesImpl) Sync(ctx context.Context, fid *Fid) (*Nil, error) {
return &Nil{}, daggerError(fs.Process.SyncFD(fid.OpaqueId), nil)
}
```
And then we have all of these arbitrary methods bound to WebAssembly modules,
where they are free to use them how they want. I think that initially there is
going to be support for this interface from Go WebAssembly modules as we can
make a lot more assumptions about how Go handles its memory management, making
it a lot easier for us to code generate reading Go structures/pointers/whatever
out of Go WebAssembly memory than we can code generate reading C structures
(recursively with pointers and C-style strings galore too).
The really cool part is that this is all powered by those three basic functions:
`require`, `invoke` and `close`. The rest is literally just stuff we can treat
as a black box for now and code generate.
As before, I would love any comments that people have on this article. Please
contact me somehow to let me know what you think. This design is probably wrong.