blog: Why Rust, A Tale of Satori (#118)

This commit is contained in:
Cadey Ratio 2020-02-15 16:58:55 -05:00 committed by GitHub
parent 955e63fa76
commit 8be1940511
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 233 additions and 0 deletions

View File

@ -0,0 +1,233 @@
---
title: Why Rust
date: 2020-02-15
tags:
- rust
- rant
- satori
- golang
---
# Why Rust
Or: A Trip Report from my Satori with Rust and Functional Programming
Software is a very odd field to work in. It is simultaneously an abstract and
physical one. You build systems that can deal with an unfathomable amount of
input and output at the same time. As a job, I peer into the madness of an
unthinking automaton and give order to the inherent chaos. I then emit
incantations to describe what this unthinking automaton should do in my stead. I
cannot possibly track the relations between a hundred thousand transactions
going on in real time, much less file them appropriately so they can be summoned
back should the need arise.
However, this incantation (by necessity) is an _unthinkably_ precise and fickle
beast. It's almost as if you are training a four-year old to go to the store,
but doing it by having them read a grocery list. This grocery list has to be
precise enough that the four year old ends up getting what you want and not a
cart full of frosted flakes and candy bars. But, at the same time, the four year
old needs to understand it. Thus, the precision.
There's many schools of thought around ways to write the grocery list. Some
follow a radically simple approach, relying on the toddler to figure things out
at the store. Sometimes this simpler approach doesn't work out in more obscure
scenarios, like when they are out of red grapes but do have green grapes, but it
tends to work out enough. Proponents of these list-making tools also will
advocate for doing full tests of the grocery list before they send the toddler
off to the store. This means setting up a fake grocery store with funny money, a
fake card, plastic food, the whole nine yards. This can get expensive and can
become a logistical issue (where are you going to store all that plastic fruit
in a way that you can just set up and tear down the grocery store mock so
quickly?).
Another school of thought is that the process of writing the grocery list should
be done in a way that prevents ambiguity at the grocery store. This kind of flow
uses some more advanced concepts like the ability to describe something by its
attributes. For example, this could specify the difference between fruit and
vegetables, and only allow fruit to be put in one place of the cart and only
allow vegetables to be placed in the other. And if the writer of the list tries
to violate this, the list gets rejected and isn't used at all.
There is yet another school of thought that decides that the exact spatial
position of the toddler relative to everything else should be thought of in
advance, along with a process to make sure that nothing is done in an improper
way. This means writing the list can be a lot harder at first, but it's much
less likely to result in the toddler coming back with a weird state. Consider
what happens if two items show up at the same time and the toddler tries to grab
both of them at the same time due to the instructions in the list! They only
have one arm to grab things with, so it just doesn't work. Proponents of the
more strict methods have reference cells and other mechanisms to ensure that the
toddler can only ever grab one thing at a time.
If we were to match these three ludicrous examples to programming languages, the
first would be Lua, the second would be Go and the third would be something like
Haskell or Rust. Software development is a complicated process because the
problems involved with directing that unthinking automaton to do what you want
are hard. There is a lot going on, much in the same way there is a lot going on
when you send a toddler to do your grocery shopping for you.
A good way to look at the tradeoffs involved is to see things as a balance
between two forces, pragmatism and correctness. Languages that are more
pragmatic are easier to develop in, but are mathematically more likely to run
into problems at runtime. Languages that are more correct take more investment
to write up front, but over time the correctness means that there's fewer failed
assumptions about what is going on. The compiler stops you from doing things
that don't make sense to it. This means that it's difficult to literally
impossible to create a bad state at runtime.
Tools like Lua and Go can (and have) been used to develop stable and viable
software. [itch.io][itchio] is written in Lua running on top of nginx and it
handles financial transactions well enough that it's turned into the guy's full
time job. Google uses Go everywhere in their stack, and it's been used to create
powerful tools like Kubernetes, Caddy, and Docker. These tools are trusted
implicitly by a generation of developers, even though the language itself has
its flaws. If you are reading this blog in Firefox, statistically there is Rust
involved in the rendering and viewing of this post. Rust is built for ensuring
that code is _as correct as possible_, even if it means eating into development
time to ensure that.
[itchio]: https://itch.io
In Rust, you don't have to memorize rules about how and when it is safe to
update data in structures, because the compiler ensures you _cannot mess it up
by rejecting the code if you could be messing it up_. You don't have to run your
tests with a race detector or figure out how to expose that in production to
trace down that obscure double-write to a non-threadsafe hashmap, because in
Rust there is no such thing as a non-threadsafe hashmap. There is only a safe
hashmap and only can ever be a safe hashmap.
As an absurd example, consider the following two snippets of code, one in Go and
one in Rust, both of them will put integers into a standard library list and
then print them all out:
```go
l := list.New() // () -> *list.List
for i := 0; i < 5; i++ {
l.PushBack(i) // interface{} -> ()
}
for e := l.Front(); e != nil; e = e.Next() {
log.Printf("%T: %v", e.Value, e.Value)
}
```
```rust
let mut vec = Vec::new::<i64>(); // () -> Vec<i64>
for i in 0..5 {
vec.push(i as i64); // (mut Vec<i64>, i64) -> ()
}
for i in vec.iter() {
println!("{}", i);
}
```
The Go version uses `interface{}` as the data element because Go [literally
cannot describe types as parameters to functions][gonerics]. The Rust version
took me a bit longer to write, but there is _no_ ambiguity as to what the vector
holds. The Go version can also hold multiple types of data in the same list,
a-la:
[gonerics]: https://golang.org/doc/faq#generics
```go
l := list.New()
l.PushBack(42)
l.PushBack("hotdogs")
l.PushBack(420.69)
```
All of which is valid because in Go, an `interface{}` matches _every kind of
value possible_. An integer is an `interface{}`. A floating-point number is an
`interface{}`. A string is an `interface{}`. A bool is an `interface{}`. Any
custom type you create is an `interface{}`. Normally, this would be very
restrictive and make it difficult to do things like JSON parsing. However the Go
runtime lets you hack around this with [reflection][wtfisreflection].
[wtfisreflection]: https://golangbot.com/reflection/
This allows the standard library to handle things like JSON parsing with
functions [that look like this](https://godoc.org/encoding/json#Unmarshal):
```
func Unmarshal(data []byte, v interface{}) error
```
There's even a set of complicated rules you need to memorize about how to trick
the JSON parser into massaging your data into place. This lets you do things
like this:
```go
type Rilkef struct {
Foo string `json:"foo"`
CallToArms string `json:"call_to_arms"`
}
```
This allows the programmer a lot of flexibility while developing and compiling
the code. It's very easy for the compiler to say "oh, hey, that could be
anything, and you gave it some kind of anything, sounds legit to me", but then
the job of ensuring the sanity of the inputs is shunted to _runtime_ rather than
stopped before the code gets deployed. This means you need to test the code in
order to see how it behaves, making sure that _the standard library is doing its
job correctly_. This kind of stuff does not happen in Rust.
The Rust version of this JSON example uses the [serde][serde] and
[serde_json][serdejson] libraries:
[serde]: https://serde.rs
[serdejson]: https://serde.rs/json.html
```rust
use serde::*;
#[derive(Serialize, Deserialize)]
pub struct Rilkef {
pub foo: String,
pub call_to_arms: String,
}
```
And the logic for handling the correct rules for serialization and
deserialization is handled at _compile time_ by the compiler itself. Serde also
allows you to support more than just JSON, so this same type can be reused for
Dhall, YAML or whatever you could imagine.
## tl;dr
Rust allows for more correctness at the cost of developer efficiency. This is a
tradeoff, but I think it may actually be worth it. Code that is more correct is
more robust and less prone to failure than code that is less correct. This leads
to software that is less likely to crash at 3 am and wake you up due to a
preventable developer error.
After working in Go for more than half a decade, I'm starting to think that it
is probably a better idea to impact developer velocity and force them to write
software that is more correct. Go works if you are careful about how you handle
it. It however amounts to a giant list of rules that you just have to know (like
maps not being threadsafe) and a lot of those rules come from battle rather than
from the development process.
This came out as more of a rant than I had thought it would, but overall I hope
my point isn't lost.
### Things You Might Complain About
Yes, I know slices exist in Go. I wanted to prove a point about how the overuse
of `interface{}` in some relatively core things (like generic lists) can cause
headaches in term of correctness. Go will reject you trying to append a string
to an integer slice, but you cannot create a type that functions identically to
an integer slice.
Go does have a race detector that will point out a lot of sins in concurrent
programs, but that is again at _runtime_, not at _compile time_.
---
Many thanks to Tene, Sr. Oracle, A. Wilfox, Byte-slice, SiIvagunner and anyone
who watched the stream where I wrote this blogpost. If I got things wrong in
this, please [reach out to me](/contact) to let me know what I messed up. This
is a composite of a few twitter threads and a conversation I had on IRC.
Thanks for reading, be well.