xesite/blog/gonads-2022-04-24.markdown

566 lines
14 KiB
Markdown

---
title: Crimes with Go Generics
date: 2022-04-24
tags:
- cursed
- golang
- generics
vod:
twitch: https://www.twitch.tv/videos/1465727432
youtube: https://youtu.be/UiJtaKYQnzg
---
Go 1.18 added [generics](https://go.dev/doc/tutorial/generics) to the
language. This allows you to have your types take types as parameters
so that you can create composite types (types out of types). This lets
you get a lot of expressivity and clarity about how you use Go.
However, if you are looking for good ideas on how to use Go generics,
this is not the post for you. This is full of bad ideas. This post is
full of ways that you should not use Go generics in production. Do not
copy the examples in this post into production. By reading this post
you agree to not copy the examples in this post into production.
I have put my code for this article [on my git
server](https://tulpa.dev/internal/gonads). This repo has been
intentionally designed to be difficult to use in production by me
taking the following steps:
1. I have created it under a Gitea organization named `internal`. This
will make it impossible for you to import the package unless you
are using it from a repo on my Gitea server. Signups are disabled
on that Gitea server. See
[here](https://go.dev/doc/go1.4#internalpackages) for more
information about the internal package rule.
1. The package documentation contains a magic comment that will make
Staticcheck and other linters complain that you are using this
package even though it is deprecated.
<xeblog-conv name="Mara" mood="hmm">What is that package
name?</xeblog-conv>
<xeblog-conv name="Cadey" mood="enby">It's a reference to
Haskell's monads, but adapted to Go as a pun.</xeblog-conv>
<xeblog-conv name="Numa" mood="delet">A gonad is just a gonoid in the
category of endgofunctors. What's there to be confused
about?</xeblog-conv>
<xeblog-conv name="Cadey" mood="facepalm">\*sigh\*</xeblog-conv>
## `Queue[T]`
To start things out, let's show off a problem in computer science that
is normally difficult. Let's make a MPMS (multiple producer, multiple
subscriber) queue.
First we are going to need a struct to wrap everything around. It will
look like this:
```go
type Queue[T any] struct {
data chan T
}
```
This creates a type named `Queue` that takes a type argument `T`. This
`T` can be absolutely anything, but the only requirement is that the
data is a Go type.
You can create a little constructor for `Queue` instances with a
function like this:
```go
func NewQueue[T any](size int) Queue[T] {
return Queue[T]{
data: make(chan T, size),
}
}
```
Now let's make some methods on the `Queue` struct that will let us
push to the queue and pop from the queue. They could look like this:
```go
func (q Queue[T]) Push(val T) {
q.data <- val
}
func (q Queue[T]) Pop() T {
return <-q.data
}
```
These methods will let you put data at the end of the queue and then
pull it out from the beginning. You can use them like this:
```go
q := NewQueue[string](5)
q.Push("hi there")
str := q.Pop()
if str != "hi there" {
panic("string is wrong")
}
```
This is good, but the main problem comes from trying to pop from an
empty queue. It'll stay there forever doing nothing. We can use the
`select` statement to allow us to write a nonblocking version of the
`Pop` function:
```go
func (q Queue[T]) TryPop() (T, bool) {
select {
case val := <-q.data:
return val, true
default:
return nil, false
}
}
```
However when we try to compile this, we get an error:
```
cannot use nil as T value in return statement
```
In that code, `T` can be _anything_, including values that may not be
able to be `nil`. We can work around this by taking advantage of the
`var` statement, which makes a new variable and initializes it to the
zero value of that type:
```go
func Zero[T any]() T {
var zero T
return zero
}
```
When we run the `Zero` function like
[this](https://go.dev/play/p/Z5tRs1-aKBU):
```go
log.Printf("%q", Zero[string]())
log.Printf("%v", Zero[int]())
```
We get output that looks like this:
```
2009/11/10 23:00:00 ""
2009/11/10 23:00:00 0
```
So we can adapt the `default` branch of `TryPop` to this:
```go
func (q Queue[T]) TryPop() (T, bool) {
select {
case val := <-q.data:
return val, true
default:
var zero T
return zero, false
}
}
```
And finally write a test for good measure:
```go
func TestQueue(t *testing.T) {
q := NewQueue[int](5)
for i := range make([]struct{}, 5) {
q.Push(i)
}
for range make([]struct{}, 5) {
t.Log(q.Pop())
}
}
```
## `Option[T]`
In Go, people use pointer values for a number of reasons:
1. A pointer value may be `nil`, so this can signal that the value may
not exist.
1. A pointer value only stores the offset in memory, so passing around
the value causes Go to only copy the pointer instead of copying the
value being passed around.
1. A pointer value being passed to a function lets you mutate values
in the value being passed. Otherwise Go will copy the value and you
can mutate it all you want, but the changes you made will not
persist past that function call. You can sort of consider this to
be "immutable", but it's not as strict as something like passing
`&mut T` to functions in Rust.
This `Option[T]` type will help us model the first kind of constraint:
a value that may not exist. We can define it like this:
```go
type Option[T any] struct {
val *T
}
```
Then you can define a couple methods to use this container:
```go
var ErrOptionIsNone = errors.New("gonads: Option[T] has no value")
func (o Option[T]) Take() (T, error) {
if o.IsNone() {
var zero T
return zero, ErrOptionIsNone
}
return *o.val, nil
}
func (o *Option[T]) Set(val T) {
o.val = &val
}
func (o *Option[T]) Clear() {
o.val = nil
}
```
Some other functions that will be useful will be an `IsSome` function
to tell if the `Option` contains a value. We can use this to also
implement an `IsNone` function that will let you tell if that `Option`
_does not_ contain a value. They will look like this:
```go
func (o Option[T]) IsSome() bool {
return o.val != nil
}
func (o Option[T]) IsNone() bool {
return !o.IsSome()
}
```
We can say that if an Option does not have something in it, it has
nothing in it. This will let us use `IsSome` to implement `IsNone`.
Finally we can add all this up to a `Yank` function, which is similar
to
[`Option::unwrap()`](https://doc.rust-lang.org/rust-by-example/error/option_unwrap.html)
in Rust:
```go
func (o Option[T]) Yank() T {
if o.IsNone() {
panic("gonads: Yank on None Option")
}
return *o.val
}
```
This will all be verified in a Go test:
```go
func TestOption(t *testing.T) {
o := NewOption[string]()
val, err := o.Take()
if err == nil {
t.Fatalf("[unexpected] wanted no value out of Option[T], got: %v", val)
}
o.Set("hello friendos")
_, err = o.Take()
if err != nil {
t.Fatalf("[unexpected] wanted no value out of Option[T], got: %v", err)
}
o.Clear()
if o.IsSome() {
t.Fatal("Option should have none, but has some")
}
}
```
<xeblog-conv name="Mara" mood="hmm">I think that
<code>Option[T]</code> will be the most useful outside of this post.
It will need some work and generalization, but this may be something
that the Go team will have to make instead of some random
person.</xeblog-conv>
## `Thunk[T]`
In computer science we usually deal with values and computations.
Usually we deal with one or the other. Sometimes computations can be
treated as values, but this is very rare. It's even more rare to take
a partially completed computation and use it as a value.
A thunk is a partially evaluated computation that is stored as a
value. For an idea of what I'm talking about, let's consider this
JavaScript function:
```javascript
const add = (x, y) => x + y;
console.log(add(2, 2)); // 4
```
This creates a function called `add` that takes two arguments and
returns one argument. This is great in many cases, but it makes it
difficult for us to bind only one argument to the function and leave
the other as a variable input. What if computing the left hand side of
`add` is expensive and only needed once?
Instead we can write `add` like this:
```javascript
const add = (x) => (y) => x + y;
console.log(add(2)(2)); // 4
```
This also allows us to make partially evaluated forms of `add` like
`addTwo`:
```javascript
const addTwo = add(2);
console.log(addTwo(3)); // 5
```
This can also be used with functions that do not take arguments, so
you can pass around a value that isn't computed yet and then only
actually compute it when needed:
```javascript
const hypotenuse = (x, y) => Math.sqrt(x * x + y * y);
const thunk = () => hypot(3, 4);
```
You can then pass this thunk to functions _without having to evaluate
it_ until it is needed:
```javascript
dominateWorld(thunk); // thunk is passed as an unevaluated function
```
We can implement this in Go by using a type like the following:
```go
type Thunk[T any] struct {
doer func() T
}
```
And then force the thunk to evaluate with a function such as `Force`:
```go
func (t Thunk[T]) Force() T {
return t.doer()
}
```
This works, however we can also go one step further than we did with
the JavaScript example. We can take advantage of the `Thunk[T]`
container to cache the result of the `doer` function so that calling
it multiple times will only actually it once and return the same
result.
<xeblog-conv name="Mara" mood="hacker">Keep in mind that this will
only work for _pure functions_, or functions that don't modify the
outside world. This isn't just global variables either, but any
function that modifies any state anywhere, including network and
filesystem IO.</xeblog-conv>
This would make `Thunk[T]` be implemented like this:
```go
type Thunk[T any] struct {
doer func() T // action being thunked
o *Option[T] // cache for complete thunk data
}
func (t *Thunk[T]) Force() T {
if t.o.IsSome() {
return t.o.Yank()
}
t.o.Set(t.doer())
return t.o.Yank()
}
func NewThunk[T any](doer func() T) *Thunk[T] {
return &Thunk[T]{
doer: doer,
o: NewOption[T](),
}
}
```
Now, for an overcomplicated example you can use this to implement the
Fibonacci function. We can start out by writing a naiive Fibonacci
function like this:
```go
func Fib(n int) int {
if n <= 1 {
return n
}
return Fib(n-1) + Fib(n-2)
}
```
We can turn this into a Go test in order to see how long it takes for
it to work:
```go
func TestRecurFib(t *testing.T) {
t.Log(Fib(40))
}
```
Then when we run `go test`:
```console
$ go test -run RecurFib
=== RUN TestRecurFib
thunk_test.go:15: 102334155
--- PASS: TestRecurFib (0.36s)
```
However, we can make this a lot more complicated with the power of the
`Thunk[T]` type:
```go
func TestThunkFib(t *testing.T) {
cache := make([]*Thunk[int], 41)
var fib func(int) int
fib = func(n int) int {
if cache[n].o.IsSome() {
return *cache[n].o.val
}
return fib(n-1) + fib(n-2)
}
for i := range cache {
i := i
cache[i] = NewThunk(func() int { return fib(i) })
}
cache[0].o.Set(0)
cache[1].o.Set(1)
t.Log(cache[40].Force())
}
```
And then run the test:
```
=== RUN TestThunkFib
thunk_test.go:36: 102334155
--- PASS: TestThunkFib (0.60s)
```
<xeblog-conv name="Mara" mood="hmm">Why is this so much slower? This
should be caching the intermediate values. Maybe something like this
would be faster? This should complete near instantly,
right?</xeblog-conv>
```go
func TestMemoizedFib(t *testing.T) {
mem := map[int]int{
0: 0,
1: 1,
}
var fib func(int) int
fib = func(n int) int {
if result, ok := mem[n]; ok {
return result
}
result := fib(n-1) + fib(n-2)
mem[n] = result
return result
}
t.Log(fib(40))
}
```
```console
$ go test -run Memoized
=== RUN TestMemoizedFib
thunk_test.go:35: 102334155
--- PASS: TestMemoizedFib (0.00s)
```
<xeblog-conv name="Cadey" mood="enby">I'm not sure
either.</xeblog-conv>
If you change the `fib` function to this, it works, but it also steps
around the `Thunk[T]` type:
```go
fib = func(n int) int {
if cache[n].o.IsSome() {
return *cache[n].o.val
}
result := fib(n-1) + fib(n-2)
cache[n].o.Set(result)
return result
}
```
This completes instantly:
```
=== RUN TestThunkFib
thunk_test.go:59: 102334155
--- PASS: TestThunkFib (0.00s)
```
To be clear, this isn't the fault of Go generics. I'm almost certain
that my terrible code is causing this to be much slower.
<xeblog-conv name="Numa" mood="delet">This is the power of gonads:
making easy code complicated, harder to reason about and slower than
the naiive approach! Why see this as terrible code when it creates an
amazing opportunity for cloud providers to suggest that people use
gonads' `Thunk[T]` so that they use more CPU and then have to pay cloud
providers more money for CPU! Think about the children!</xeblog-conv>
---
<xeblog-conv name="Cadey" mood="coffee">EDIT(2022 M04 25 05:56): amscanne on
Hacker News pointed out that my code was in fact wrong. My `fib` function should
have been a lot simpler.</xeblog-conv>
```go
fib = func(n int) int {
return cache[n-1].Force() + cache[n-2].Force()
}
```
<xeblog-conv name="Cadey" mood="facepalm">Applying this also makes the code run
instantly as I'd expect. I knew _something_ was _very wrong_, but I never
expected something this stupid. Thanks amscanne!</xeblog-conv>
<xeblog-conv name="Numa" mood="happy">Hey, it makes for good surrealism. If that
isn't a success, what is?</xeblog-conv>
---
I'm glad that Go has added generics to the language. It's certainly
going to make a lot of things a lot easier and more expressive. I'm
worried that the process of learning how to use generics in Go is
going to create a lot of churn and toil as people get up to speed on
when and where they should be used. These should be used in specific
cases, not as a bread and butter tool.
I hope this was an interesting look into how you can use generics in
Go, but again please don't use these examples in production.