xesite/gonads-2022-04-24.markdown at 9f977b388223d2bb87cd6d63980379c7ed96b218

14 KiB

Raw Blame History

title

date

`Queue[T]`

To start things out, let's show off a problem in computer science that is normally difficult. Let's make a MPMS (multiple producer, multiple subscriber) queue.

First we are going to need a struct to wrap everything around. It will look like this:

type Queue[T any] struct {
  data chan T
}

This creates a type named Queue that takes a type argument T. This T can be absolutely anything, but the only requirement is that the data is a Go type.

You can create a little constructor for Queue instances with a function like this:

func NewQueue[T any](size int) Queue[T] {
  return Queue[T]{
    data: make(chan T, size),
  }
}

Now let's make some methods on the Queue struct that will let us push to the queue and pop from the queue. They could look like this:

func (q Queue[T]) Push(val T) {
  q.data <- val
}

func (q Queue[T]) Pop() T {
  return <-q.data
}

These methods will let you put data at the end of the queue and then pull it out from the beginning. You can use them like this:

q := NewQueue[string](5)
q.Push("hi there")
str := q.Pop()
if str != "hi there" {
  panic("string is wrong")
}

This is good, but the main problem comes from trying to pop from an empty queue. It'll stay there forever doing nothing. We can use the select statement to allow us to write a nonblocking version of the Pop function:

func (q Queue[T]) TryPop() (T, bool) {
  select {
  case val := <-q.data:
    return val, true
  default:
    return nil, false
  }
}

However when we try to compile this, we get an error:

cannot use nil as T value in return statement

In that code, T can be anything, including values that may not be able to be nil. We can work around this by taking advantage of the var statement, which makes a new variable and initializes it to the zero value of that type:

func Zero[T any]() T {
  var zero T
  return zero
}

When we run the Zero function like this:

log.Printf("%q", Zero[string]())
log.Printf("%v", Zero[int]())

We get output that looks like this:

2009/11/10 23:00:00 ""
2009/11/10 23:00:00 0

So we can adapt the default branch of TryPop to this:

func (q Queue[T]) TryPop() (T, bool) {
  select {
  case val := <-q.data:
    return val, true
  default:
    var zero T
    return zero, false
  }
}

And finally write a test for good measure:

func TestQueue(t *testing.T) {
  q := NewQueue[int](5)
  for i := range make([]struct{}, 5) {
    q.Push(i)
  }
	
  for range make([]struct{}, 5) {
    t.Log(q.Pop())
  }
}

`Option[T]`

In Go, people use pointer values for a number of reasons:

A pointer value may be nil, so this can signal that the value may not exist.
A pointer value only stores the offset in memory, so passing around the value causes Go to only copy the pointer instead of copying the value being passed around.
A pointer value being passed to a function lets you mutate values in the value being passed. Otherwise Go will copy the value and you can mutate it all you want, but the changes you made will not persist past that function call. You can sort of consider this to be "immutable", but it's not as strict as something like passing &mut T to functions in Rust.

This Option[T] type will help us model the first kind of constraint: a value that may not exist. We can define it like this:

type Option[T any] struct {
  val *T
}

Then you can define a couple methods to use this container:

var ErrOptionIsNone = errors.New("gonads: Option[T] has no value")

func (o Option[T]) Take() (T, error) {
  if o.IsNone() {
    var zero T
    return zero, ErrOptionIsNone
  }

  return *o.val, nil
}

func (o *Option[T]) Set(val T) {
  o.val = &val
}

func (o *Option[T]) Clear() {
  o.val = nil
}

Some other functions that will be useful will be an IsSome function to tell if the Option contains a value. We can use this to also implement an IsNone function that will let you tell if that Option does not contain a value. They will look like this:

func (o Option[T]) IsSome() bool {
  return o.val != nil
}

func (o Option[T]) IsNone() bool {
  return !o.IsSome()
}

We can say that if an Option does not have something in it, it has nothing in it. This will let us use IsSome to implement IsNone.

Finally we can add all this up to a Yank function, which is similar to Option::unwrap() in Rust:

func (o Option[T]) Yank() T {
  if o.IsNone() {
    panic("gonads: Yank on None Option")
  }

  return *o.val
}

This will all be verified in a Go test:

func TestOption(t *testing.T) {
  o := NewOption[string]()
  val, err := o.Take()
  if err == nil {
    t.Fatalf("[unexpected] wanted no value out of Option[T], got: %v", val)
  }
    
  o.Set("hello friendos")
  _, err = o.Take()
  if err != nil {
    t.Fatalf("[unexpected] wanted no value out of Option[T], got: %v", err)
  }
    
  o.Clear()
  if o.IsSome() {
    t.Fatal("Option should have none, but has some")
  }
}

I think that Option[T] will be the most useful outside of this post. It will need some work and generalization, but this may be something that the Go team will have to make instead of some random person.

`Thunk[T]`

In computer science we usually deal with values and computations. Usually we deal with one or the other. Sometimes computations can be treated as values, but this is very rare. It's even more rare to take a partially completed computation and use it as a value.

A thunk is a partially evaluated computation that is stored as a value. For an idea of what I'm talking about, let's consider this JavaScript function:

const add = (x, y) => x + y;
console.log(add(2, 2)); // 4

This creates a function called add that takes two arguments and returns one argument. This is great in many cases, but it makes it difficult for us to bind only one argument to the function and leave the other as a variable input. What if computing the left hand side of add is expensive and only needed once?

Instead we can write add like this:

const add = (x) => (y) => x + y;
console.log(add(2)(2)); // 4

This also allows us to make partially evaluated forms of add like addTwo:

const addTwo = add(2);
console.log(addTwo(3)); // 5

This can also be used with functions that do not take arguments, so you can pass around a value that isn't computed yet and then only actually compute it when needed:

const hypotenuse = (x, y) => Math.sqrt(x * x + y * y);
const thunk = () => hypot(3, 4);

You can then pass this thunk to functions without having to evaluate it until it is needed:

dominateWorld(thunk); // thunk is passed as an unevaluated function

We can implement this in Go by using a type like the following:

type Thunk[T any] struct {
  doer func() T
}

And then force the thunk to evaluate with a function such as Force:

func (t Thunk[T]) Force() T {
  return t.doer()
}

This works, however we can also go one step further than we did with the JavaScript example. We can take advantage of the Thunk[T] container to cache the result of the doer function so that calling it multiple times will only actually it once and return the same result.

Keep in mind that this will only work for pure functions, or functions that don't modify the outside world. This isn't just global variables either, but any function that modifies any state anywhere, including network and filesystem IO.

This would make Thunk[T] be implemented like this:

type Thunk[T any] struct {
  doer func() T // action being thunked
  o    *Option[T] // cache for complete thunk data
}

func (t *Thunk[T]) Force() T {
  if t.o.IsSome() {
    return t.o.Yank()
  }
    
  t.o.Set(t.doer())
  return t.o.Yank()
}

func NewThunk[T any](doer func() T) *Thunk[T] {
  return &Thunk[T]{
    doer: doer,
    o:    NewOption[T](),
  }
}

Now, for an overcomplicated example you can use this to implement the Fibonacci function. We can start out by writing a naiive Fibonacci function like this:

func Fib(n int) int {
  if n <= 1 {
    return n
  }
    
  return Fib(n-1) + Fib(n-2)
}

We can turn this into a Go test in order to see how long it takes for it to work:

func TestRecurFib(t *testing.T) {
  t.Log(Fib(40))
}

Then when we run go test:

$ go test -run RecurFib
=== RUN   TestRecurFib
    thunk_test.go:15: 102334155
--- PASS: TestRecurFib (0.36s)

However, we can make this a lot more complicated with the power of the Thunk[T] type:

func TestThunkFib(t *testing.T) {
  cache := make([]*Thunk[int], 41)
  
  var fib func(int) int
  fib = func(n int) int {
    if cache[n].o.IsSome() {
      return *cache[n].o.val
    }
    return fib(n-1) + fib(n-2)
  }
  
  for i := range cache {
    i := i
    cache[i] = NewThunk(func() int { return fib(i) })
  }
  cache[0].o.Set(0)
  cache[1].o.Set(1)
  
  t.Log(cache[40].Force())
}

And then run the test:

=== RUN   TestThunkFib
    thunk_test.go:36: 102334155
--- PASS: TestThunkFib (0.60s)

Why is this so much slower? This should be caching the intermediate values. Maybe something like this would be faster? This should complete near instantly, right?

func TestMemoizedFib(t *testing.T) {
  mem := map[int]int{
    0: 0,
    1: 1,
  }
    
  var fib func(int) int
  fib = func(n int) int {
    if result, ok := mem[n]; ok {
      return result
    }
        
    result := fib(n-1) + fib(n-2)
    mem[n] = result
    return result
  }
    
  t.Log(fib(40))
}

$ go test -run Memoized
=== RUN   TestMemoizedFib
    thunk_test.go:35: 102334155
--- PASS: TestMemoizedFib (0.00s)

I'm not sure either.

If you change the fib function to this, it works, but it also steps around the Thunk[T] type:

fib = func(n int) int {
  if cache[n].o.IsSome() {
    return *cache[n].o.val
  }
  
  result := fib(n-1) + fib(n-2)
  cache[n].o.Set(result)
  return result
}

This completes instantly:

=== RUN   TestThunkFib
    thunk_test.go:59: 102334155
--- PASS: TestThunkFib (0.00s)

To be clear, this isn't the fault of Go generics. I'm almost certain that my terrible code is causing this to be much slower.

This is the power of gonads: making easy code complicated, harder to reason about and slower than the naiive approach! Why see this as terrible code when it creates an amazing opportunity for cloud providers to suggest that people use gonads' Thunk[T] so that they use more CPU and then have to pay cloud providers more money for CPU! Think about the children!

EDIT(2022 M04 25 05:56): amscanne on Hacker News pointed out that my code was in fact wrong. My fib function should have been a lot simpler.

fib = func(n int) int {
  return cache[n-1].Force() + cache[n-2].Force()
}

Applying this also makes the code run instantly as I'd expect. I knew something was very wrong, but I never expected something this stupid. Thanks amscanne!

Hey, it makes for good surrealism. If that isn't a success, what is?

I'm glad that Go has added generics to the language. It's certainly going to make a lot of things a lot easier and more expressive. I'm worried that the process of learning how to use generics in Go is going to create a lot of churn and toil as people get up to speed on when and where they should be used. These should be used in specific cases, not as a bread and butter tool.

I hope this was an interesting look into how you can use generics in Go, but again please don't use these examples in production.

14 KiB Raw Blame History

Queue[T]

Option[T]

Thunk[T]

14 KiB

Raw Blame History

`Queue[T]`

`Option[T]`

`Thunk[T]`