From 12674b67b750bb50d31e9cc8dea5ff4b8d603f17 Mon Sep 17 00:00:00 2001 From: Christine Dodrill Date: Sun, 30 Jun 2019 07:39:20 -0400 Subject: [PATCH] blog: add h langauge post (#59) --- blog/h-language-2019-06-30.markdown | 325 ++++++++++++++++++++++++++++ 1 file changed, 325 insertions(+) create mode 100644 blog/h-language-2019-06-30.markdown diff --git a/blog/h-language-2019-06-30.markdown b/blog/h-language-2019-06-30.markdown new file mode 100644 index 0000000..74b9d68 --- /dev/null +++ b/blog/h-language-2019-06-30.markdown @@ -0,0 +1,325 @@ +--- +title: The h Programming Language +date: 2019-06-30 +--- + +# The h Programming Language + +[h](https://h.christine.website) is a project of mine that I have released +recently. It is a single-paradigm, multi-tenant friendly, turing-incomplete +programming language that does nothing but print one of two things: + +- the letter h +- a single quote (the Lojbanic "h") + +It does this via [WebAssembly](https://webassembly.org). This may sound like a +pointless complication, but actually this ends up making things _a lot simpler_. +WebAssembly is a virtual machine (fake computer that only exists in code) intended +for browsers, but I've been using it for server-side tasks. + +I have written more about/with WebAssembly in the past in these posts: + +- https://christine.website/talks/webassembly-on-the-server-system-calls-2019-05-31 +- https://christine.website/blog/olin-1-why-09-1-2018 +- https://christine.website/blog/olin-2-the-future-09-5-2018 +- https://christine.website/blog/land-1-syscalls-file-io-2018-06-18 +- https://christine.website/blog/templeos-2-god-the-rng-2019-05-30 + +This is a continuation of the following two posts: + +- https://christine.website/blog/the-origin-of-h-2015-12-14 +- https://christine.website/blog/formal-grammar-of-h-2019-05-19 + +All of the relevant code for h is [here](https://github.com/Xe/x/tree/master/cmd/h). + +h is a somewhat standard three-phase compiler. Each of the phases is as follows: + +## Parsing the Grammar + +As mentioned in a prior post, h has a formal grammar defined in [Parsing Expression Grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar). +I took this [grammar](https://github.com/Xe/x/blob/v1.1.7/h/h.peg) (with some +minor modifications) and fed it into a tool called [peggy](https://github.com/eaburns/peggy) +to generate a Go source [version of the parser](https://github.com/Xe/x/blob/v1.1.7/h/h_gen.go). +This parser has some minimal [wrappers](https://github.com/Xe/x/blob/v1.1.7/h/parser.go) +around it, mostly to simplify the output and remove unneeded nodes from the tree. +This simplifies the later compilation phases. + +The input to h looks something like this: + +``` +h +``` + +The output syntax tree pretty-prints to something like this: + +``` +H("h") +``` + +This is also represented using a tree of nodes that looks something like this: + +``` +&peg.Node{ + Name: "H", + Text: "h", + Kids: nil, +} +``` + +A more complicated program will look something like this: + +``` +&peg.Node{ + Name: "H", + Text: "h h h", + Kids: { + &peg.Node{ + Name: "", + Text: "h", + Kids: nil, + }, + &peg.Node{ + Name: "", + Text: "h", + Kids: nil, + }, + &peg.Node{ + Name: "", + Text: "h", + Kids: nil, + }, + }, +} +``` + +Now that we have this syntax tree, it's easy to go to the next phase of +compilation: generating the WebAssembly Text Format. + +## WebAssembly Text Format + +[WebAssembly Text Format](https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format) +is a human-editable and understandable version of WebAssembly. It is pretty low +level, but it is actually fairly simple. Let's take an example of the h compiler +output and break it down: + +``` +(module + (import "h" "h" (func $h (param i32))) + (func $h_main + (local i32 i32 i32) + (local.set 0 (i32.const 10)) + (local.set 1 (i32.const 104)) + (local.set 2 (i32.const 39)) + (call $h (get_local 1)) + (call $h (get_local 0)) + ) + (export "h" (func $h_main)) +) +``` + +Fundamentally, WebAssembly binary files are also called modules. Each .wasm file +can have only one module defined in it. Modules can have sections that contain the +following information: + +- External function imports +- Function definitions +- Memory information +- Named function exports +- Global variable definitions +- Other custom data that may be vendor-specific + +h only uses external function imports, function definitions and named function +exports. + +`import` imports a function from the surrounding runtime with two fields: module +and function name. Because this is an obfuscated language, the function `h` from +module `h` is imported as `$h`. This function works somewhat like the C library +function [putchar()](https://www.tutorialspoint.com/c_standard_library/c_function_putchar.htm). + +`func` creates a function. In this case we are creating a function named `$h_main`. +This will be the entrypoint for the h program. + +Inside the function `$h_main`, there are three local variables created: `0`, `1` and `2`. +They correlate to the following values: + +| Local Number | Explanation | Integer Value | +| :----------- | :---------------- | :------------ | +| 0 | Newline character | 10 | +| 1 | Lowercase h | 104 | +| 2 | Single quote | 39 | + +As such, this program prints a single lowercase h and then a newline. + +`export` lets consumers of this WebAssembly module get a name for a function, +linear memory or global value. As we only need one function in this module, +we export `$h_main` as `"h"`. + +## Compiling this to a Binary + +The next phase of compiling is to turn this WebAssembly Text Format into a binary. +For simplicity, the tool `wat2wasm` from the [WebAssembly Binary Toolkit](https://github.com/WebAssembly/wabt) +is used. This tool creates a WebAssembly binary out of WebAssembly Text Format. + +Usage is simple (assuming you have the WebAssembly Text Format file above saved as `h.wat`): + +``` +wat2wasm h.wat -o h.wasm +``` + +And you will create `h.wasm` with the following sha256 sum: + +``` +sha256sum h.wasm +8457720ae0dd2deee38761a9d7b305eabe30cba731b1148a5bbc5399bf82401a h.wasm +``` + +Now that the final binary is created, we can move to the runtime phase. + +## Runtime + +The h [runtime](https://github.com/Xe/x/blob/v1.1.7/cmd/h/run.go) is incredibly +simple. It provides the `h.h` putchar-like function and executes the `h` +function from the binary you feed it. It also times execution as well as keeps +track of the number of instructions the program runs. This is called "gas" for +historical reasons involving [blockchains](https://blockgeeks.com/guides/ethereum-gas/). + +I use [Perlin Network's life](https://github.com/perlin-network/life) as the +implementation of WebAssembly in h. I have experience with it from [Olin](https://github.com/Xe/olin). + +## The Playground + +As part of this project, I wanted to create an [interactive playground](https://h.christine.website/play). +This allows users to run arbitrary h programs on my server. As the only system +call is putchar, this is safe. The playground also has some limitations on how +big of a program it can run. The playground server works like this: + +- The user program is sent over HTTP with Content-Type [text/plain](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L402-L413) +- The program is [limited to 75 bytes on the server](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L44) (though this is [configurable](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L15) via flags or envvars) +- The program is [compiled](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L53) +- The program is [ran](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L59) +- The output is [returned via JSON](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L65-L72) +- This output is then put [into the playground page with JavaScript](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L389-L394) + +The output of this call looks something like this: + +``` +curl -H "Content-Type: text/plain" --data "h" https://h.christine.website/api/playground | jq +{ + "prog": { + "src": "h", + "wat": "(module\n (import \"h\" \"h\" (func $h (param i32)))\n (func $h_main\n (local i32 i32 i32)\n (local.set 0 (i32.const 10))\n (local.set 1 (i32.const 104))\n (local.set 2 (i32.const 39))\n (call $h (get_local 1))\n (call $h (get_local 0))\n )\n (export \"h\" (func $h_main))\n)", + "bin": "AGFzbQEAAAABCAJgAX8AYAAAAgcBAWgBaAAAAwIBAQcFAQFoAAEKGwEZAQN/QQohAEHoACEBQSchAiABEAAgABAACw==", + "ast": "H(\"h\")" + }, + "res": { + "out": "h\n", + "gas": 11, + "exec_duration": 12345 + } +} +``` + +The execution duration is in [nanoseconds](https://godoc.org/time#Duration), as +it is just directly a Go standard library time duration. + +## Bugs h has Found + +This will be updated in the future, but h has already found a bug in [Innative](https://innative.dev). +There was a bug in how Innative handled C name mangling of binaries. Output of +the h compiler is now [a test case in Innative](https://github.com/innative-sdk/innative/commit/6353d59d611164ce38b938840dd4f3f1ea894e1b#diff-dc4a79872612bb26927f9639df223856R1). +I consider this a success for the project. It is such a little thing, but it +means a lot to me for some reason. My shitpost created a test case in a project +I tried to integrate it with. + +That's just awesome to me in ways I have trouble explaining. + +As such, h programs _do_ work with Innative. Here's how to do it: + +First, install the h compiler and runtime with the following command: + +``` +go get within.website/x/cmd/h +``` + +This will install the `h` binary to your `$GOPATH/bin`, so ensure that is part +of your path (if it is not already): + +``` +export GOPATH=$HOME/go +export PATH=$PATH:$GOPATH/bin +``` + +Then create a h binary like this: + +``` +h -p "h h" -o hh.wasm +``` + +Now we need to provide Innative the `h.h` system call implementation, so open +`h.c` and enter in the following: + +``` +#include + +void h_WASM_h(char data) { + putchar(data); +} +``` + +Then build it to an object file: + +``` +gcc -c -o h.o h.c +``` + +Then pack it into a static library `.ar` file: + +``` +ar rsv libh.a h.o +``` + +Then create the shared object with Innative: + +``` +innative-cmd -l ./libh.a hh.wasm +``` + +This should create `hh.so` in the current working directory. + +Now create the following [Nim](https://nim-lang.org) wrapper at `h.nim`: + +``` +proc hh_WASM_h() {. importc, dynlib: "./hh.so" .} + +hh_WASM_h() +``` + +and build it: + +``` +nim c h.nim +``` + +then run it: + +``` +./h +h +``` + +And congrats, you have now compiled h to a native shared object. + +## Why + +Now, something you might be asking yourself as you read through this post is +something like: "Why the heck are you doing this?" That's honestly a good +question. One of the things I want to do with computers is to create art for the +sake of art. h is one of these such projects. h is not a productive tool. You +cannot create anything useful with h. This is an exercise in creating a compiler +and runtime from scratch, based on my past experiences with parsing lojban, +WebAssembly on the server and frustrating marketing around programming tools. I +wanted to create something that deliberately pokes at all of the common ways +that programming languages and tooling are advertised. I wanted to make it a +fully secure tool as well, with an arbitrary limitation of having no memory +usage. Everything is fully functional. There are a few grammar bugs that I'm +calling features.