14 KiB
title | date | tags | vod | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Crimes with Go Generics | 2022-04-24 |
|
|
Go 1.18 added generics to the language. This allows you to have your types take types as parameters so that you can create composite types (types out of types). This lets you get a lot of expressivity and clarity about how you use Go.
However, if you are looking for good ideas on how to use Go generics, this is not the post for you. This is full of bad ideas. This post is full of ways that you should not use Go generics in production. Do not copy the examples in this post into production. By reading this post you agree to not copy the examples in this post into production.
I have put my code for this article on my git server. This repo has been intentionally designed to be difficult to use in production by me taking the following steps:
- I have created it under a Gitea organization named
internal
. This will make it impossible for you to import the package unless you are using it from a repo on my Gitea server. Signups are disabled on that Gitea server. See here for more information about the internal package rule. - The package documentation contains a magic comment that will make Staticcheck and other linters complain that you are using this package even though it is deprecated.
What is that package name?
It's a reference to Haskell's monads, but adapted to Go as a pun.
A gonad is just a gonoid in the category of endgofunctors. What's there to be confused about?
*sigh*
Queue[T]
To start things out, let's show off a problem in computer science that is normally difficult. Let's make a MPMS (multiple producer, multiple subscriber) queue.
First we are going to need a struct to wrap everything around. It will look like this:
type Queue[T any] struct {
data chan T
}
This creates a type named Queue
that takes a type argument T
. This
T
can be absolutely anything, but the only requirement is that the
data is a Go type.
You can create a little constructor for Queue
instances with a
function like this:
func NewQueue[T any](size int) Queue[T] {
return Queue[T]{
data: make(chan T, size),
}
}
Now let's make some methods on the Queue
struct that will let us
push to the queue and pop from the queue. They could look like this:
func (q Queue[T]) Push(val T) {
q.data <- val
}
func (q Queue[T]) Pop() T {
return <-q.data
}
These methods will let you put data at the end of the queue and then pull it out from the beginning. You can use them like this:
q := NewQueue[string](5)
q.Push("hi there")
str := q.Pop()
if str != "hi there" {
panic("string is wrong")
}
This is good, but the main problem comes from trying to pop from an
empty queue. It'll stay there forever doing nothing. We can use the
select
statement to allow us to write a nonblocking version of the
Pop
function:
func (q Queue[T]) TryPop() (T, bool) {
select {
case val := <-q.data:
return val, true
default:
return nil, false
}
}
However when we try to compile this, we get an error:
cannot use nil as T value in return statement
In that code, T
can be anything, including values that may not be
able to be nil
. We can work around this by taking advantage of the
var
statement, which makes a new variable and initializes it to the
zero value of that type:
func Zero[T any]() T {
var zero T
return zero
}
When we run the Zero
function like
this:
log.Printf("%q", Zero[string]())
log.Printf("%v", Zero[int]())
We get output that looks like this:
2009/11/10 23:00:00 ""
2009/11/10 23:00:00 0
So we can adapt the default
branch of TryPop
to this:
func (q Queue[T]) TryPop() (T, bool) {
select {
case val := <-q.data:
return val, true
default:
var zero T
return zero, false
}
}
And finally write a test for good measure:
func TestQueue(t *testing.T) {
q := NewQueue[int](5)
for i := range make([]struct{}, 5) {
q.Push(i)
}
for range make([]struct{}, 5) {
t.Log(q.Pop())
}
}
Option[T]
In Go, people use pointer values for a number of reasons:
- A pointer value may be
nil
, so this can signal that the value may not exist. - A pointer value only stores the offset in memory, so passing around the value causes Go to only copy the pointer instead of copying the value being passed around.
- A pointer value being passed to a function lets you mutate values
in the value being passed. Otherwise Go will copy the value and you
can mutate it all you want, but the changes you made will not
persist past that function call. You can sort of consider this to
be "immutable", but it's not as strict as something like passing
&mut T
to functions in Rust.
This Option[T]
type will help us model the first kind of constraint:
a value that may not exist. We can define it like this:
type Option[T any] struct {
val *T
}
Then you can define a couple methods to use this container:
var ErrOptionIsNone = errors.New("gonads: Option[T] has no value")
func (o Option[T]) Take() (T, error) {
if o.IsNone() {
var zero T
return zero, ErrOptionIsNone
}
return *o.val, nil
}
func (o *Option[T]) Set(val T) {
o.val = &val
}
func (o *Option[T]) Clear() {
o.val = nil
}
Some other functions that will be useful will be an IsSome
function
to tell if the Option
contains a value. We can use this to also
implement an IsNone
function that will let you tell if that Option
does not contain a value. They will look like this:
func (o Option[T]) IsSome() bool {
return o.val != nil
}
func (o Option[T]) IsNone() bool {
return !o.IsSome()
}
We can say that if an Option does not have something in it, it has
nothing in it. This will let us use IsSome
to implement IsNone
.
Finally we can add all this up to a Yank
function, which is similar
to
Option::unwrap()
in Rust:
func (o Option[T]) Yank() T {
if o.IsNone() {
panic("gonads: Yank on None Option")
}
return *o.val
}
This will all be verified in a Go test:
func TestOption(t *testing.T) {
o := NewOption[string]()
val, err := o.Take()
if err == nil {
t.Fatalf("[unexpected] wanted no value out of Option[T], got: %v", val)
}
o.Set("hello friendos")
_, err = o.Take()
if err != nil {
t.Fatalf("[unexpected] wanted no value out of Option[T], got: %v", err)
}
o.Clear()
if o.IsSome() {
t.Fatal("Option should have none, but has some")
}
}
I think that
Option[T]
will be the most useful outside of this post.
It will need some work and generalization, but this may be something
that the Go team will have to make instead of some random
person.
Thunk[T]
In computer science we usually deal with values and computations. Usually we deal with one or the other. Sometimes computations can be treated as values, but this is very rare. It's even more rare to take a partially completed computation and use it as a value.
A thunk is a partially evaluated computation that is stored as a value. For an idea of what I'm talking about, let's consider this JavaScript function:
const add = (x, y) => x + y;
console.log(add(2, 2)); // 4
This creates a function called add
that takes two arguments and
returns one argument. This is great in many cases, but it makes it
difficult for us to bind only one argument to the function and leave
the other as a variable input. What if computing the left hand side of
add
is expensive and only needed once?
Instead we can write add
like this:
const add = (x) => (y) => x + y;
console.log(add(2)(2)); // 4
This also allows us to make partially evaluated forms of add
like
addTwo
:
const addTwo = add(2);
console.log(addTwo(3)); // 5
This can also be used with functions that do not take arguments, so you can pass around a value that isn't computed yet and then only actually compute it when needed:
const hypotenuse = (x, y) => Math.sqrt(x * x + y * y);
const thunk = () => hypot(3, 4);
You can then pass this thunk to functions without having to evaluate it until it is needed:
dominateWorld(thunk); // thunk is passed as an unevaluated function
We can implement this in Go by using a type like the following:
type Thunk[T any] struct {
doer func() T
}
And then force the thunk to evaluate with a function such as Force
:
func (t Thunk[T]) Force() T {
return t.doer()
}
This works, however we can also go one step further than we did with
the JavaScript example. We can take advantage of the Thunk[T]
container to cache the result of the doer
function so that calling
it multiple times will only actually it once and return the same
result.
Keep in mind that this will only work for pure functions, or functions that don't modify the outside world. This isn't just global variables either, but any function that modifies any state anywhere, including network and filesystem IO.
This would make Thunk[T]
be implemented like this:
type Thunk[T any] struct {
doer func() T // action being thunked
o *Option[T] // cache for complete thunk data
}
func (t *Thunk[T]) Force() T {
if t.o.IsSome() {
return t.o.Yank()
}
t.o.Set(t.doer())
return t.o.Yank()
}
func NewThunk[T any](doer func() T) *Thunk[T] {
return &Thunk[T]{
doer: doer,
o: NewOption[T](),
}
}
Now, for an overcomplicated example you can use this to implement the Fibonacci function. We can start out by writing a naiive Fibonacci function like this:
func Fib(n int) int {
if n <= 1 {
return n
}
return Fib(n-1) + Fib(n-2)
}
We can turn this into a Go test in order to see how long it takes for it to work:
func TestRecurFib(t *testing.T) {
t.Log(Fib(40))
}
Then when we run go test
:
$ go test -run RecurFib
=== RUN TestRecurFib
thunk_test.go:15: 102334155
--- PASS: TestRecurFib (0.36s)
However, we can make this a lot more complicated with the power of the
Thunk[T]
type:
func TestThunkFib(t *testing.T) {
cache := make([]*Thunk[int], 41)
var fib func(int) int
fib = func(n int) int {
if cache[n].o.IsSome() {
return *cache[n].o.val
}
return fib(n-1) + fib(n-2)
}
for i := range cache {
i := i
cache[i] = NewThunk(func() int { return fib(i) })
}
cache[0].o.Set(0)
cache[1].o.Set(1)
t.Log(cache[40].Force())
}
And then run the test:
=== RUN TestThunkFib
thunk_test.go:36: 102334155
--- PASS: TestThunkFib (0.60s)
Why is this so much slower? This should be caching the intermediate values. Maybe something like this would be faster? This should complete near instantly, right?
func TestMemoizedFib(t *testing.T) {
mem := map[int]int{
0: 0,
1: 1,
}
var fib func(int) int
fib = func(n int) int {
if result, ok := mem[n]; ok {
return result
}
result := fib(n-1) + fib(n-2)
mem[n] = result
return result
}
t.Log(fib(40))
}
$ go test -run Memoized
=== RUN TestMemoizedFib
thunk_test.go:35: 102334155
--- PASS: TestMemoizedFib (0.00s)
I'm not sure either.
If you change the fib
function to this, it works, but it also steps
around the Thunk[T]
type:
fib = func(n int) int {
if cache[n].o.IsSome() {
return *cache[n].o.val
}
result := fib(n-1) + fib(n-2)
cache[n].o.Set(result)
return result
}
This completes instantly:
=== RUN TestThunkFib
thunk_test.go:59: 102334155
--- PASS: TestThunkFib (0.00s)
To be clear, this isn't the fault of Go generics. I'm almost certain that my terrible code is causing this to be much slower.
This is the power of gonads:
making easy code complicated, harder to reason about and slower than
the naiive approach! Why see this as terrible code when it creates an
amazing opportunity for cloud providers to suggest that people use
gonads' Thunk[T]
so that they use more CPU and then have to pay cloud
providers more money for CPU! Think about the children!
EDIT(2022 M04 25 05:56): amscanne on
Hacker News pointed out that my code was in fact wrong. My fib
function should
have been a lot simpler.
fib = func(n int) int {
return cache[n-1].Force() + cache[n-2].Force()
}
Applying this also makes the code run instantly as I'd expect. I knew something was very wrong, but I never expected something this stupid. Thanks amscanne!
Hey, it makes for good surrealism. If that isn't a success, what is?
I'm glad that Go has added generics to the language. It's certainly going to make a lot of things a lot easier and more expressive. I'm worried that the process of learning how to use generics in Go is going to create a lot of churn and toil as people get up to speed on when and where they should be used. These should be used in specific cases, not as a bread and butter tool.
I hope this was an interesting look into how you can use generics in Go, but again please don't use these examples in production.