10 KiB
title | date | tags | ||||
---|---|---|---|---|---|---|
Why Rust | 2020-02-15 |
|
Why Rust
Or: A Trip Report from my Satori with Rust and Functional Programming
Software is a very odd field to work in. It is simultaneously an abstract and physical one. You build systems that can deal with an unfathomable amount of input and output at the same time. As a job, I peer into the madness of an unthinking automaton and give order to the inherent chaos. I then emit incantations to describe what this unthinking automaton should do in my stead. I cannot possibly track the relations between a hundred thousand transactions going on in real time, much less file them appropriately so they can be summoned back should the need arise.
However, this incantation (by necessity) is an unthinkably precise and fickle beast. It's almost as if you are training a four-year old to go to the store, but doing it by having them read a grocery list. This grocery list has to be precise enough that the four year old ends up getting what you want and not a cart full of frosted flakes and candy bars. But, at the same time, the four year old needs to understand it. Thus, the precision.
There's many schools of thought around ways to write the grocery list. Some follow a radically simple approach, relying on the toddler to figure things out at the store. Sometimes this simpler approach doesn't work out in more obscure scenarios, like when they are out of red grapes but do have green grapes, but it tends to work out enough. Proponents of these list-making tools also will advocate for doing full tests of the grocery list before they send the toddler off to the store. This means setting up a fake grocery store with funny money, a fake card, plastic food, the whole nine yards. This can get expensive and can become a logistical issue (where are you going to store all that plastic fruit in a way that you can just set up and tear down the grocery store mock so quickly?).
Another school of thought is that the process of writing the grocery list should be done in a way that prevents ambiguity at the grocery store. This kind of flow uses some more advanced concepts like the ability to describe something by its attributes. For example, this could specify the difference between fruit and vegetables, and only allow fruit to be put in one place of the cart and only allow vegetables to be placed in the other. And if the writer of the list tries to violate this, the list gets rejected and isn't used at all.
There is yet another school of thought that decides that the exact spatial position of the toddler relative to everything else should be thought of in advance, along with a process to make sure that nothing is done in an improper way. This means writing the list can be a lot harder at first, but it's much less likely to result in the toddler coming back with a weird state. Consider what happens if two items show up at the same time and the toddler tries to grab both of them at the same time due to the instructions in the list! They only have one arm to grab things with, so it just doesn't work. Proponents of the more strict methods have reference cells and other mechanisms to ensure that the toddler can only ever grab one thing at a time.
If we were to match these three ludicrous examples to programming languages, the first would be Lua, the second would be Go and the third would be something like Haskell or Rust. Software development is a complicated process because the problems involved with directing that unthinking automaton to do what you want are hard. There is a lot going on, much in the same way there is a lot going on when you send a toddler to do your grocery shopping for you.
A good way to look at the tradeoffs involved is to see things as a balance between two forces, pragmatism and correctness. Languages that are more pragmatic are easier to develop in, but are mathematically more likely to run into problems at runtime. Languages that are more correct take more investment to write up front, but over time the correctness means that there's fewer failed assumptions about what is going on. The compiler stops you from doing things that don't make sense to it. This means that it's difficult to literally impossible to create a bad state at runtime.
Tools like Lua and Go can (and have) been used to develop stable and viable software. itch.io is written in Lua running on top of nginx and it handles financial transactions well enough that it's turned into the guy's full time job. Google uses Go everywhere in their stack, and it's been used to create powerful tools like Kubernetes, Caddy, and Docker. These tools are trusted implicitly by a generation of developers, even though the language itself has its flaws. If you are reading this blog in Firefox, statistically there is Rust involved in the rendering and viewing of this post. Rust is built for ensuring that code is as correct as possible, even if it means eating into development time to ensure that.
In Rust, you don't have to memorize rules about how and when it is safe to update data in structures, because the compiler ensures you cannot mess it up by rejecting the code if you could be messing it up. You don't have to run your tests with a race detector or figure out how to expose that in production to trace down that obscure double-write to a non-threadsafe hashmap, because in Rust there is no such thing as a non-threadsafe hashmap. There is only a safe hashmap and only can ever be a safe hashmap.
As an absurd example, consider the following two snippets of code, one in Go and one in Rust, both of them will put integers into a standard library list and then print them all out:
l := list.New() // () -> *list.List
for i := 0; i < 5; i++ {
l.PushBack(i) // interface{} -> ()
}
for e := l.Front(); e != nil; e = e.Next() {
log.Printf("%T: %v", e.Value, e.Value)
}
let mut vec = Vec::new::<i64>(); // () -> Vec<i64>
for i in 0..5 {
vec.push(i as i64); // (mut Vec<i64>, i64) -> ()
}
for i in vec.iter() {
println!("{}", i);
}
The Go version uses interface{}
as the data element because Go literally
cannot describe types as parameters to functions. The Rust version
took me a bit longer to write, but there is no ambiguity as to what the vector
holds. The Go version can also hold multiple types of data in the same list,
a-la:
l := list.New()
l.PushBack(42)
l.PushBack("hotdogs")
l.PushBack(420.69)
All of which is valid because in Go, an interface{}
matches every kind of
value possible. An integer is an interface{}
. A floating-point number is an
interface{}
. A string is an interface{}
. A bool is an interface{}
. Any
custom type you create is an interface{}
. Normally, this would be very
restrictive and make it difficult to do things like JSON parsing. However the Go
runtime lets you hack around this with reflection.
This allows the standard library to handle things like JSON parsing with functions that look like this:
func Unmarshal(data []byte, v interface{}) error
There's even a set of complicated rules you need to memorize about how to trick the JSON parser into massaging your data into place. This lets you do things like this:
type Rilkef struct {
Foo string `json:"foo"`
CallToArms string `json:"call_to_arms"`
}
This allows the programmer a lot of flexibility while developing and compiling the code. It's very easy for the compiler to say "oh, hey, that could be anything, and you gave it some kind of anything, sounds legit to me", but then the job of ensuring the sanity of the inputs is shunted to runtime rather than stopped before the code gets deployed. This means you need to test the code in order to see how it behaves, making sure that the standard library is doing its job correctly. This kind of stuff does not happen in Rust.
The Rust version of this JSON example uses the serde and serde_json libraries:
use serde::*;
#[derive(Serialize, Deserialize)]
pub struct Rilkef {
pub foo: String,
pub call_to_arms: String,
}
And the logic for handling the correct rules for serialization and deserialization is handled at compile time by the compiler itself. Serde also allows you to support more than just JSON, so this same type can be reused for Dhall, YAML or whatever you could imagine.
tl;dr
Rust allows for more correctness at the cost of developer efficiency. This is a tradeoff, but I think it may actually be worth it. Code that is more correct is more robust and less prone to failure than code that is less correct. This leads to software that is less likely to crash at 3 am and wake you up due to a preventable developer error.
After working in Go for more than half a decade, I'm starting to think that it is probably a better idea to impact developer velocity and force them to write software that is more correct. Go works if you are careful about how you handle it. It however amounts to a giant list of rules that you just have to know (like maps not being threadsafe) and a lot of those rules come from battle rather than from the development process.
This came out as more of a rant than I had thought it would, but overall I hope my point isn't lost.
Things You Might Complain About
Yes, I know slices exist in Go. I wanted to prove a point about how the overuse
of interface{}
in some relatively core things (like generic lists) can cause
headaches in term of correctness. Go will reject you trying to append a string
to an integer slice, but you cannot create a type that functions identically to
an integer slice.
Go does have a race detector that will point out a lot of sins in concurrent programs, but that is again at runtime, not at compile time.
Many thanks to Tene, Sr. Oracle, A. Wilfox, Byte-slice, SiIvagunner and anyone who watched the stream where I wrote this blogpost. If I got things wrong in this, please reach out to me to let me know what I messed up. This is a composite of a few twitter threads and a conversation I had on IRC.
Thanks for reading, be well.