blog: add h langauge post (#59)
This commit is contained in:
parent
b0644a82c5
commit
12674b67b7
|
@ -0,0 +1,325 @@
|
|||
---
|
||||
title: The h Programming Language
|
||||
date: 2019-06-30
|
||||
---
|
||||
|
||||
# The h Programming Language
|
||||
|
||||
[h](https://h.christine.website) is a project of mine that I have released
|
||||
recently. It is a single-paradigm, multi-tenant friendly, turing-incomplete
|
||||
programming language that does nothing but print one of two things:
|
||||
|
||||
- the letter h
|
||||
- a single quote (the Lojbanic "h")
|
||||
|
||||
It does this via [WebAssembly](https://webassembly.org). This may sound like a
|
||||
pointless complication, but actually this ends up making things _a lot simpler_.
|
||||
WebAssembly is a virtual machine (fake computer that only exists in code) intended
|
||||
for browsers, but I've been using it for server-side tasks.
|
||||
|
||||
I have written more about/with WebAssembly in the past in these posts:
|
||||
|
||||
- https://christine.website/talks/webassembly-on-the-server-system-calls-2019-05-31
|
||||
- https://christine.website/blog/olin-1-why-09-1-2018
|
||||
- https://christine.website/blog/olin-2-the-future-09-5-2018
|
||||
- https://christine.website/blog/land-1-syscalls-file-io-2018-06-18
|
||||
- https://christine.website/blog/templeos-2-god-the-rng-2019-05-30
|
||||
|
||||
This is a continuation of the following two posts:
|
||||
|
||||
- https://christine.website/blog/the-origin-of-h-2015-12-14
|
||||
- https://christine.website/blog/formal-grammar-of-h-2019-05-19
|
||||
|
||||
All of the relevant code for h is [here](https://github.com/Xe/x/tree/master/cmd/h).
|
||||
|
||||
h is a somewhat standard three-phase compiler. Each of the phases is as follows:
|
||||
|
||||
## Parsing the Grammar
|
||||
|
||||
As mentioned in a prior post, h has a formal grammar defined in [Parsing Expression Grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar).
|
||||
I took this [grammar](https://github.com/Xe/x/blob/v1.1.7/h/h.peg) (with some
|
||||
minor modifications) and fed it into a tool called [peggy](https://github.com/eaburns/peggy)
|
||||
to generate a Go source [version of the parser](https://github.com/Xe/x/blob/v1.1.7/h/h_gen.go).
|
||||
This parser has some minimal [wrappers](https://github.com/Xe/x/blob/v1.1.7/h/parser.go)
|
||||
around it, mostly to simplify the output and remove unneeded nodes from the tree.
|
||||
This simplifies the later compilation phases.
|
||||
|
||||
The input to h looks something like this:
|
||||
|
||||
```
|
||||
h
|
||||
```
|
||||
|
||||
The output syntax tree pretty-prints to something like this:
|
||||
|
||||
```
|
||||
H("h")
|
||||
```
|
||||
|
||||
This is also represented using a tree of nodes that looks something like this:
|
||||
|
||||
```
|
||||
&peg.Node{
|
||||
Name: "H",
|
||||
Text: "h",
|
||||
Kids: nil,
|
||||
}
|
||||
```
|
||||
|
||||
A more complicated program will look something like this:
|
||||
|
||||
```
|
||||
&peg.Node{
|
||||
Name: "H",
|
||||
Text: "h h h",
|
||||
Kids: {
|
||||
&peg.Node{
|
||||
Name: "",
|
||||
Text: "h",
|
||||
Kids: nil,
|
||||
},
|
||||
&peg.Node{
|
||||
Name: "",
|
||||
Text: "h",
|
||||
Kids: nil,
|
||||
},
|
||||
&peg.Node{
|
||||
Name: "",
|
||||
Text: "h",
|
||||
Kids: nil,
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Now that we have this syntax tree, it's easy to go to the next phase of
|
||||
compilation: generating the WebAssembly Text Format.
|
||||
|
||||
## WebAssembly Text Format
|
||||
|
||||
[WebAssembly Text Format](https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format)
|
||||
is a human-editable and understandable version of WebAssembly. It is pretty low
|
||||
level, but it is actually fairly simple. Let's take an example of the h compiler
|
||||
output and break it down:
|
||||
|
||||
```
|
||||
(module
|
||||
(import "h" "h" (func $h (param i32)))
|
||||
(func $h_main
|
||||
(local i32 i32 i32)
|
||||
(local.set 0 (i32.const 10))
|
||||
(local.set 1 (i32.const 104))
|
||||
(local.set 2 (i32.const 39))
|
||||
(call $h (get_local 1))
|
||||
(call $h (get_local 0))
|
||||
)
|
||||
(export "h" (func $h_main))
|
||||
)
|
||||
```
|
||||
|
||||
Fundamentally, WebAssembly binary files are also called modules. Each .wasm file
|
||||
can have only one module defined in it. Modules can have sections that contain the
|
||||
following information:
|
||||
|
||||
- External function imports
|
||||
- Function definitions
|
||||
- Memory information
|
||||
- Named function exports
|
||||
- Global variable definitions
|
||||
- Other custom data that may be vendor-specific
|
||||
|
||||
h only uses external function imports, function definitions and named function
|
||||
exports.
|
||||
|
||||
`import` imports a function from the surrounding runtime with two fields: module
|
||||
and function name. Because this is an obfuscated language, the function `h` from
|
||||
module `h` is imported as `$h`. This function works somewhat like the C library
|
||||
function [putchar()](https://www.tutorialspoint.com/c_standard_library/c_function_putchar.htm).
|
||||
|
||||
`func` creates a function. In this case we are creating a function named `$h_main`.
|
||||
This will be the entrypoint for the h program.
|
||||
|
||||
Inside the function `$h_main`, there are three local variables created: `0`, `1` and `2`.
|
||||
They correlate to the following values:
|
||||
|
||||
| Local Number | Explanation | Integer Value |
|
||||
| :----------- | :---------------- | :------------ |
|
||||
| 0 | Newline character | 10 |
|
||||
| 1 | Lowercase h | 104 |
|
||||
| 2 | Single quote | 39 |
|
||||
|
||||
As such, this program prints a single lowercase h and then a newline.
|
||||
|
||||
`export` lets consumers of this WebAssembly module get a name for a function,
|
||||
linear memory or global value. As we only need one function in this module,
|
||||
we export `$h_main` as `"h"`.
|
||||
|
||||
## Compiling this to a Binary
|
||||
|
||||
The next phase of compiling is to turn this WebAssembly Text Format into a binary.
|
||||
For simplicity, the tool `wat2wasm` from the [WebAssembly Binary Toolkit](https://github.com/WebAssembly/wabt)
|
||||
is used. This tool creates a WebAssembly binary out of WebAssembly Text Format.
|
||||
|
||||
Usage is simple (assuming you have the WebAssembly Text Format file above saved as `h.wat`):
|
||||
|
||||
```
|
||||
wat2wasm h.wat -o h.wasm
|
||||
```
|
||||
|
||||
And you will create `h.wasm` with the following sha256 sum:
|
||||
|
||||
```
|
||||
sha256sum h.wasm
|
||||
8457720ae0dd2deee38761a9d7b305eabe30cba731b1148a5bbc5399bf82401a h.wasm
|
||||
```
|
||||
|
||||
Now that the final binary is created, we can move to the runtime phase.
|
||||
|
||||
## Runtime
|
||||
|
||||
The h [runtime](https://github.com/Xe/x/blob/v1.1.7/cmd/h/run.go) is incredibly
|
||||
simple. It provides the `h.h` putchar-like function and executes the `h`
|
||||
function from the binary you feed it. It also times execution as well as keeps
|
||||
track of the number of instructions the program runs. This is called "gas" for
|
||||
historical reasons involving [blockchains](https://blockgeeks.com/guides/ethereum-gas/).
|
||||
|
||||
I use [Perlin Network's life](https://github.com/perlin-network/life) as the
|
||||
implementation of WebAssembly in h. I have experience with it from [Olin](https://github.com/Xe/olin).
|
||||
|
||||
## The Playground
|
||||
|
||||
As part of this project, I wanted to create an [interactive playground](https://h.christine.website/play).
|
||||
This allows users to run arbitrary h programs on my server. As the only system
|
||||
call is putchar, this is safe. The playground also has some limitations on how
|
||||
big of a program it can run. The playground server works like this:
|
||||
|
||||
- The user program is sent over HTTP with Content-Type [text/plain](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L402-L413)
|
||||
- The program is [limited to 75 bytes on the server](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L44) (though this is [configurable](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L15) via flags or envvars)
|
||||
- The program is [compiled](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L53)
|
||||
- The program is [ran](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L59)
|
||||
- The output is [returned via JSON](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L65-L72)
|
||||
- This output is then put [into the playground page with JavaScript](https://github.com/Xe/x/blob/v1.1.7/cmd/h/http.go#L389-L394)
|
||||
|
||||
The output of this call looks something like this:
|
||||
|
||||
```
|
||||
curl -H "Content-Type: text/plain" --data "h" https://h.christine.website/api/playground | jq
|
||||
{
|
||||
"prog": {
|
||||
"src": "h",
|
||||
"wat": "(module\n (import \"h\" \"h\" (func $h (param i32)))\n (func $h_main\n (local i32 i32 i32)\n (local.set 0 (i32.const 10))\n (local.set 1 (i32.const 104))\n (local.set 2 (i32.const 39))\n (call $h (get_local 1))\n (call $h (get_local 0))\n )\n (export \"h\" (func $h_main))\n)",
|
||||
"bin": "AGFzbQEAAAABCAJgAX8AYAAAAgcBAWgBaAAAAwIBAQcFAQFoAAEKGwEZAQN/QQohAEHoACEBQSchAiABEAAgABAACw==",
|
||||
"ast": "H(\"h\")"
|
||||
},
|
||||
"res": {
|
||||
"out": "h\n",
|
||||
"gas": 11,
|
||||
"exec_duration": 12345
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The execution duration is in [nanoseconds](https://godoc.org/time#Duration), as
|
||||
it is just directly a Go standard library time duration.
|
||||
|
||||
## Bugs h has Found
|
||||
|
||||
This will be updated in the future, but h has already found a bug in [Innative](https://innative.dev).
|
||||
There was a bug in how Innative handled C name mangling of binaries. Output of
|
||||
the h compiler is now [a test case in Innative](https://github.com/innative-sdk/innative/commit/6353d59d611164ce38b938840dd4f3f1ea894e1b#diff-dc4a79872612bb26927f9639df223856R1).
|
||||
I consider this a success for the project. It is such a little thing, but it
|
||||
means a lot to me for some reason. My shitpost created a test case in a project
|
||||
I tried to integrate it with.
|
||||
|
||||
That's just awesome to me in ways I have trouble explaining.
|
||||
|
||||
As such, h programs _do_ work with Innative. Here's how to do it:
|
||||
|
||||
First, install the h compiler and runtime with the following command:
|
||||
|
||||
```
|
||||
go get within.website/x/cmd/h
|
||||
```
|
||||
|
||||
This will install the `h` binary to your `$GOPATH/bin`, so ensure that is part
|
||||
of your path (if it is not already):
|
||||
|
||||
```
|
||||
export GOPATH=$HOME/go
|
||||
export PATH=$PATH:$GOPATH/bin
|
||||
```
|
||||
|
||||
Then create a h binary like this:
|
||||
|
||||
```
|
||||
h -p "h h" -o hh.wasm
|
||||
```
|
||||
|
||||
Now we need to provide Innative the `h.h` system call implementation, so open
|
||||
`h.c` and enter in the following:
|
||||
|
||||
```
|
||||
#include <stdio.h>
|
||||
|
||||
void h_WASM_h(char data) {
|
||||
putchar(data);
|
||||
}
|
||||
```
|
||||
|
||||
Then build it to an object file:
|
||||
|
||||
```
|
||||
gcc -c -o h.o h.c
|
||||
```
|
||||
|
||||
Then pack it into a static library `.ar` file:
|
||||
|
||||
```
|
||||
ar rsv libh.a h.o
|
||||
```
|
||||
|
||||
Then create the shared object with Innative:
|
||||
|
||||
```
|
||||
innative-cmd -l ./libh.a hh.wasm
|
||||
```
|
||||
|
||||
This should create `hh.so` in the current working directory.
|
||||
|
||||
Now create the following [Nim](https://nim-lang.org) wrapper at `h.nim`:
|
||||
|
||||
```
|
||||
proc hh_WASM_h() {. importc, dynlib: "./hh.so" .}
|
||||
|
||||
hh_WASM_h()
|
||||
```
|
||||
|
||||
and build it:
|
||||
|
||||
```
|
||||
nim c h.nim
|
||||
```
|
||||
|
||||
then run it:
|
||||
|
||||
```
|
||||
./h
|
||||
h
|
||||
```
|
||||
|
||||
And congrats, you have now compiled h to a native shared object.
|
||||
|
||||
## Why
|
||||
|
||||
Now, something you might be asking yourself as you read through this post is
|
||||
something like: "Why the heck are you doing this?" That's honestly a good
|
||||
question. One of the things I want to do with computers is to create art for the
|
||||
sake of art. h is one of these such projects. h is not a productive tool. You
|
||||
cannot create anything useful with h. This is an exercise in creating a compiler
|
||||
and runtime from scratch, based on my past experiences with parsing lojban,
|
||||
WebAssembly on the server and frustrating marketing around programming tools. I
|
||||
wanted to create something that deliberately pokes at all of the common ways
|
||||
that programming languages and tooling are advertised. I wanted to make it a
|
||||
fully secure tool as well, with an arbitrary limitation of having no memory
|
||||
usage. Everything is fully functional. There are a few grammar bugs that I'm
|
||||
calling features.
|
Loading…
Reference in New Issue