xesite/blog/h-language-2019-06-30.markdown

11 KiB

title date tags
The h Programming Language 2019-06-30
wasm
release

h is a project of mine that I have released recently. It is a single-paradigm, multi-tenant friendly, turing-incomplete programming language that does nothing but print one of two things:

  • the letter h
  • a single quote (the Lojbanic "h")

It does this via WebAssembly. This may sound like a pointless complication, but actually this ends up making things a lot simpler. WebAssembly is a virtual machine (fake computer that only exists in code) intended for browsers, but I've been using it for server-side tasks.

I have written more about/with WebAssembly in the past in these posts:

This is a continuation of the following two posts:

All of the relevant code for h is here.

h is a somewhat standard three-phase compiler. Each of the phases is as follows:

Parsing the Grammar

As mentioned in a prior post, h has a formal grammar defined in Parsing Expression Grammar. I took this grammar (with some minor modifications) and fed it into a tool called peggy to generate a Go source version of the parser. This parser has some minimal wrappers around it, mostly to simplify the output and remove unneeded nodes from the tree. This simplifies the later compilation phases.

The input to h looks something like this:

h

The output syntax tree pretty-prints to something like this:

H("h")

This is also represented using a tree of nodes that looks something like this:

&peg.Node{
    Name: "H",
    Text: "h",
    Kids: nil,
}

A more complicated program will look something like this:

&peg.Node{
    Name: "H",
    Text: "h h h",
    Kids: {
        &peg.Node{
            Name: "",
            Text: "h",
            Kids: nil,
        },
        &peg.Node{
            Name: "",
            Text: "h",
            Kids: nil,
        },
        &peg.Node{
            Name: "",
            Text: "h",
            Kids: nil,
        },
    },
}

Now that we have this syntax tree, it's easy to go to the next phase of compilation: generating the WebAssembly Text Format.

WebAssembly Text Format

WebAssembly Text Format is a human-editable and understandable version of WebAssembly. It is pretty low level, but it is actually fairly simple. Let's take an example of the h compiler output and break it down:

(module
 (import "h" "h" (func $h (param i32)))
 (func $h_main
       (local i32 i32 i32)
       (local.set 0 (i32.const 10))
       (local.set 1 (i32.const 104))
       (local.set 2 (i32.const 39))
       (call $h (get_local 1))
       (call $h (get_local 0))
 )
 (export "h" (func $h_main))
)

Fundamentally, WebAssembly binary files are also called modules. Each .wasm file can have only one module defined in it. Modules can have sections that contain the following information:

  • External function imports
  • Function definitions
  • Memory information
  • Named function exports
  • Global variable definitions
  • Other custom data that may be vendor-specific

h only uses external function imports, function definitions and named function exports.

import imports a function from the surrounding runtime with two fields: module and function name. Because this is an obfuscated language, the function h from module h is imported as $h. This function works somewhat like the C library function putchar().

func creates a function. In this case we are creating a function named $h_main. This will be the entrypoint for the h program.

Inside the function $h_main, there are three local variables created: 0, 1 and 2. They correlate to the following values:

Local Number Explanation Integer Value
0 Newline character 10
1 Lowercase h 104
2 Single quote 39

As such, this program prints a single lowercase h and then a newline.

export lets consumers of this WebAssembly module get a name for a function, linear memory or global value. As we only need one function in this module, we export $h_main as "h".

Compiling this to a Binary

The next phase of compiling is to turn this WebAssembly Text Format into a binary. For simplicity, the tool wat2wasm from the WebAssembly Binary Toolkit is used. This tool creates a WebAssembly binary out of WebAssembly Text Format.

Usage is simple (assuming you have the WebAssembly Text Format file above saved as h.wat):

wat2wasm h.wat -o h.wasm

And you will create h.wasm with the following sha256 sum:

sha256sum h.wasm
8457720ae0dd2deee38761a9d7b305eabe30cba731b1148a5bbc5399bf82401a  h.wasm

Now that the final binary is created, we can move to the runtime phase.

Runtime

The h runtime is incredibly simple. It provides the h.h putchar-like function and executes the h function from the binary you feed it. It also times execution as well as keeps track of the number of instructions the program runs. This is called "gas" for historical reasons involving blockchains.

I use Perlin Network's life as the implementation of WebAssembly in h. I have experience with it from Olin.

The Playground

As part of this project, I wanted to create an interactive playground. This allows users to run arbitrary h programs on my server. As the only system call is putchar, this is safe. The playground also has some limitations on how big of a program it can run. The playground server works like this:

The output of this call looks something like this:

curl -H "Content-Type: text/plain" --data "h" https://h.christine.website/api/playground | jq
{
  "prog": {
    "src": "h",
    "wat": "(module\n (import \"h\" \"h\" (func $h (param i32)))\n (func $h_main\n       (local i32 i32 i32)\n       (local.set 0 (i32.const 10))\n       (local.set 1 (i32.const 104))\n       (local.set 2 (i32.const 39))\n       (call $h (get_local 1))\n       (call $h (get_local 0))\n )\n (export \"h\" (func $h_main))\n)",
    "bin": "AGFzbQEAAAABCAJgAX8AYAAAAgcBAWgBaAAAAwIBAQcFAQFoAAEKGwEZAQN/QQohAEHoACEBQSchAiABEAAgABAACw==",
    "ast": "H(\"h\")"
  },
  "res": {
    "out": "h\n",
    "gas": 11,
    "exec_duration": 12345
  }
}

The execution duration is in nanoseconds, as it is just directly a Go standard library time duration.

Bugs h has Found

This will be updated in the future, but h has already found a bug in Innative. There was a bug in how Innative handled C name mangling of binaries. Output of the h compiler is now a test case in Innative. I consider this a success for the project. It is such a little thing, but it means a lot to me for some reason. My shitpost created a test case in a project I tried to integrate it with.

That's just awesome to me in ways I have trouble explaining.

As such, h programs do work with Innative. Here's how to do it:

First, install the h compiler and runtime with the following command:

go get within.website/x/cmd/h

This will install the h binary to your $GOPATH/bin, so ensure that is part of your path (if it is not already):

export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin

Then create a h binary like this:

h -p "h h" -o hh.wasm

Now we need to provide Innative the h.h system call implementation, so open h.c and enter in the following:

#include <stdio.h>

void h_WASM_h(char data) {
  putchar(data);
}

Then build it to an object file:

gcc -c -o h.o h.c

Then pack it into a static library .ar file:

ar rsv libh.a h.o

Then create the shared object with Innative:

innative-cmd -l ./libh.a hh.wasm

This should create hh.so in the current working directory.

Now create the following Nim wrapper at h.nim:

proc hh_WASM_h() {. importc, dynlib: "./hh.so" .}

hh_WASM_h()

and build it:

nim c h.nim

then run it:

./h
h

And congrats, you have now compiled h to a native shared object.

Why

Now, something you might be asking yourself as you read through this post is something like: "Why the heck are you doing this?" That's honestly a good question. One of the things I want to do with computers is to create art for the sake of art. h is one of these such projects. h is not a productive tool. You cannot create anything useful with h. This is an exercise in creating a compiler and runtime from scratch, based on my past experiences with parsing lojban, WebAssembly on the server and frustrating marketing around programming tools. I wanted to create something that deliberately pokes at all of the common ways that programming languages and tooling are advertised. I wanted to make it a fully secure tool as well, with an arbitrary limitation of having no memory usage. Everything is fully functional. There are a few grammar bugs that I'm calling features.