blog/content/post/03-bites-the-rust.md
2019-08-25 14:58:41 +02:00

668 lines
23 KiB
Markdown

---
title: "Another one bites the Rust"
subtitle: "Lessons learned from my first Rust project"
date: 2019-08-25
draft: false
tags: [dev, programming, rust]
---
# Rust, the (re)discovery 🗺️
Back to **2014**, the year I wrote my first _Rust_ program ! Wow, that was
even **before** version 1.0 !
At my school, it was all about _C_, and 10 years of _C_ is quite boring
(because, yes, I started programming way before my engineering school). I
wanted some fresh air, and even if I really liked _Python_ back to that time,
i'm more into **statically typed** programming languages.
I can't remember exactly what was my first contact with _Rust_, maybe a blog
post or a reddit post, and my reaction was something like :
![Brain Explosion (gif)](https://media.giphy.com/media/xT0xeJpnrWC4XWblEk/giphy.gif)
Now, let's dig some reasons about why _Rust_ blows my mind.
_Rust_ is a programming langage focused on safety, and concurrency. It's
basically the modern replacement of C++, plus, a multi-paradigm approach to
system programming development. This langage was created and designed by
Graydon Hoare at Mozilla and used for the _Servo_ browser engine, now
embedded in _Mozilla Firefox_.
This system try to be as **memory safe** as possible :
- no null pointers
- no dandling pointers
- no data races
- values have to be initialized and used
- no need to free allocated data
Back in 2014, the first thing that come to my mind was :
Yeah, yeah, just yet another garbage collected langage
And I was wrong ! _Rust_ uses other mechanisms such as _ownership_,
_lifetimes_, _borrowing_, and the ultimate **borrow checker** to ensure that
memory is safe and freed only when data will not be used anywhere else in the
program (_ie_ : when _lifetime_ is over). This new memory management concepts
ensure _safety_ **AND** _speed_ since there is no overhead generated by a
garbage collection.
For me, as a young hardcore C programmer[^1], this was literally heaven. Less struggling
with `calloc`, `malloc`, `free` and `valgrind`, thanks god _Mozilla_ ! Bless me !
**But**, I dropped it in 2015. Why ? Because this was, at this time, far from perfect.
Struggling with all the new concepts was quite disturbing, add to that cryptic compilation
errors and you open a gate to _brainhell_. My young me was not enough confident to learn
and there was no community to help me understand things that was unclear to me.
Five years later, my programming skills are clearly not at the same level as
before, learning and writing a lot of _Golang_ and _Javascript_ stuff, playing
with _Elixir_ and _Haskell_, completely changed how I manipulate and how I
visualize code in day to day basis. It was time to **give another** chance to _Rust_.
# Fediwatcher 📊
In order to practice my _Rust_ skill, I wanted to build a concrete and
useful _(at least for me)_ project.
Inspired by the work of **href** on
[fediverse.network](https://fediverse.network) my main idea was to build a
small app to fetch metrics from various instances of the **fediverse** and
push it into an [InfluxDB](https://influxdata.com) timeseries database.
**Fediwatcher** was born !
The code is available on a [github repo](https://github.com/papey/fediwatcher)
associated with my [github account](https://github.com/papey).
If you're interested, check out the [Fediwatcher public instance](https://metrics.papey.fr).
Ok, ok, enough personal promotion, now that you get the main idea, go for
the technical part and all the lessons learn writing this small project !
# Cargo, compiling & building 🏗️
`cargo` is the _Rust_ **standard** packet manager created and maintained by
the _Rust_ project. This is the tool used by all rustaceans and that's a good
thing. If you don't know it yet, I'm also a gopher, and package management
with `go` is a big pile of shit. In 2019, leaving package management to the
community is, I think, the biggest mistake you can make when creating a new
programming language. _So go for cargo !_
`cargo` is used for :
- downloading app dependencies
- compiling app dependencies
- compiling your project
- running your project
- running tests on your project
- publishing you project to [crates.io](https://crates.io)
All the informations are contained in the `Cargo.toml` file, no need for a
`makefile` makes my brain happy and having a common way to tests code without
external packages is pretty straightforward and a strong invitation to test
your code.
All the standard stuff also means that default docker images contains all you
need to setup a _Continous Integration_ without the need to maintain specific
container images for a specific test suite or whatever. Less time building
stuff, means more productive time to code. With _Rust_, all the batteries are
included.
About compiling, spoiler alert, _Rust_ project compiling **IS SLOW** :
{{< highlight sh >}}
cargo clean
time cargo build
Compiling autocfg v0.1.4
Compiling libc v0.2.58
Compiling arrayvec v0.4.10
Compiling spin v0.5.0
Compiling proc-macro2 v0.4.30
[...]
Compiling reqwest v0.9.18
Compiling fediwatcher v0.1.0 (/Users/wilfried/code/github/fediwatcher)
Finished dev [unoptimized + debuginfo] target(s) in 4m 25s
cargo build 571,39s user 50,53s system 233% cpu 4:25,95 total
{{< / highlight >}}
This is quite surprising if you compare it to a fast compiling language like `go`,
but that's fair because the compiler have to check a bunch of things related to
memory safety. With no garbage collector, speed at runtime and memory safety,
you have to pay a price and this price is the **compile time**.
But I really think it's not a weakness for _Rust_ because `cargo` caching is
amazing and after the first compilation, iterations are pretty fast so it's
not a real issue.
When it comes to building a _Docker_ image, I learned a nice tip to optimize
container image building with a clever use of layers. Here is the tip !
{{< highlight dockerfile "linenos=table" >}}
# New empty project
RUN USER=root cargo new --bin fediwatcher
WORKDIR /fediwatcher
# Fetch deps list
COPY ./Cargo.lock ./Cargo.lock
COPY ./Cargo.toml ./Cargo.toml
# Step to build a default hello world project.
# Since Cargo.lock and Cargo.toml are present,
# all deps will be downloaded and cached inside this upper layer
RUN cargo build --release
RUN rm src/\*.rs
# Now, copy source code
COPY ./src ./src
# Build the real project
RUN rm ./target/release/deps/fediwatcher\*
RUN cargo build --release
{{< / highlight >}}
Remember that dependencies are less volatile than code, and with containers
this means get dependencies as soon as possible and copy code later ! In the
_Rust_ case, the first thing to do is creating an empty project using `cargo new`.
This will create a default project with a basic hello world in
`main.rs` file.
After that, copy all things related to dependencies
(`Cargo.toml` and `Cargo.lock` files) and trigger a build, in this image
layer, all the deps will be downloaded and compiled.
Now that there is a layer
containing all the dependencies, copy the real source code and then compile
the real project. With this technique, the dependencies layer will be cached and used
in later build. Believe me, this a time saver !
Not lost yet ? Good, because there is more, so take a deep breath and go
digging some _Rust_ features.
# Flow control 🛂
_Rust_ takes inspiration from various programming language, mainly _C++_ a
imperative language, but there is also a lot of features that are typical in
_functional programming_. I already write some _Haskell_ (because
**Xmonad** ftw) and some _Elixir_ but I don't feel very confident with
functional programming yet.
I find this salad mix called as multi-paradigm
programming very convenient to understand and try some functional way of
thinking.
The top most functional feature of rust is the `match` statement. To me,
this the most beautiful and clean way to handle multiple paths inside a
program. For imperative programmers out there, a `match` is like a `switch case` on steroids.
To illustrate, let's look at a simple example[^2].
{{< highlight rust >}}
let number = 2;
println!("Tell me something about {}", number);
match number {
// Match a single value
1 => println!("One!"),
// Match several values
2 | 3 | 5 | 7 | 11 => println!("This is a prime"),
// Match an inclusive range
13..=19 => println!("A teen"),
// Whatever
_ => println!("Ain't special"),
}
{{< / highlight >}}
Here, all the cases are matched, but what if I removed the last branch ?
{{< highlight txt >}}
help: ensure that all possible cases are being handled, possibly by adding
wildcards or more match arms
{{< / highlight >}}
See ? _Rust_ violently pointing out missing stuff, and that's why it's a
pleasant language to use.
A `match` statement can also be used to _destructure_ a variable, a common
pattern in _functional_ programming. Destructuring is a process used to
break a structure into multiple and independent variables. This can also be
useful when you need only a part of a structure, making your code more
comprehensive and readable[^3].
{{< highlight rust >}}
struct Foo {
x: (u32, u32),
y: u32,
}
// Try changing the values in the struct to see what happens
let foo = Foo { x: (1, 2), y: 3 };
match foo {
Foo { x: (1, b), y } => println!("First of x is 1, b = {}, y = {} ", b, y),
// you can destructure structs and rename the variables,
// the order is not important
Foo { y: 2, x: i } => println!("y is 2, i = {:?}", i),
// and you can also ignore some variables:
Foo { y, .. } => println!("y = {}, we don't care about x", y),
// this will give an error: pattern does not mention field `x`
//Foo { y } => println!("y = {}", y);
}
{{< / highlight >}}
With `match`, I made my first step inside the _functional programming_ way
of thinking. The second one was iterators, functions chaining and
closures, the perfect combo ! The idea is to chain function and pass input
and output from one to another. Chaining using small scope functions made
code more redable, more testable and more reliable. As always, an example !
{{< highlight rust >}}
let iterator = [1,2,3,4,5].iter();
// fold, is also known as reduce, in other languages
let sum = iterator.fold(0, |total, next| total + next);
println!("{}", sum);
{{< / highlight >}}
The first line is used to create a `iterator`, a structure used to perform
tasks on a sequence of items. Later on, a specific method associated with
iterators `fold` is used to sum up all items inside the iterator and produce
a single final value : the sum. As a parameter, we pass a `closure` (a
function defined on the fly) with a `total` and a `next` arguments. The
`total` variable is used to store current count status and `next` is the
next value inside the iterator to add to `total`.
A non functional alternative as the code shown
above will be something like :
{{< highlight rust >}}
let collection = [1,2,3,4,5];
let mut sum = 0;
for elem in collection.iter() {
sum += elem;
}
println!("{}", sum);
{{< / highlight >}}
With more complex data, more operations, removing for loops and chaining
function using `map`, `filter` or `fold` really makes code cleaner and easier
to understand. You just get important stuff, there is no distraction and
a code without boiler plate lines is less error probes.
Flow control is a large domain and it contains error handling. In _Rust_
there is two kind of generic errors : `Option` used to describe the
possibility of _absence_ and `Result` used as supersed of `Option` to handle
the possibility of errors.
Here is the definition of an `Option` :
{{< highlight rust >}}
enum Option<T> {
None,
Some(T),
}
{{< / highlight >}}
Where `None` means "no value" and `Some(T)` means "some variable (of type `T`)"
An `Option` is useful if you, for example, search for a file that may not exists
{{< highlight rust >}}
let file = "not.exists";
match find(file, '.') {
None => println!("File not found."),
Some(i) => println!("File found : {}", &file),
}
{{< / highlight >}}
If you need an explicit error to handle, go for `Result` :
{{< highlight rust >}}
enum Result<T, E> {
Ok(T),
Err(E),
}
{{< / highlight >}}
Where `Ok(T)` means "everything is good for the value (of type `T`)" and
`Err(E)` means "An error (of type `E`) occurs". To conclude, it's possible to
define an `Option` like this :
{{< highlight rust >}}
type Option<T> = Result<T, ()>;
{{< / highlight >}}
"An `Option` is a `Result` with an empty `Err` value". Q.E.D !
At this point of my journey (re)discovering _Rust_ I was super happy with all
this new concepts. As a gopher, I know how crappy error handling can be in
other languages, so a clean and standard way to handle error, count me in.
So, what about composing functions that needs error handling ? Ahah ! Let's
go :
{{< highlight rust "linenos=table, hl_lines=32-40 43-45 48-50" >}}
// An example using music bands
// Allow dead code, for learning purpose
#![allow(dead_code)]
#[derive(Debug)]
enum Bands {
AAL,
Alcest,
Sabaton,
}
// But does it djent ?
fn does_it_djent(b: Bands) -> Option<Bands> {
match b {
// Only Animals As Leaders djents
Bands::AAL => Some(b),
_ => None,
}
}
// Do I like it ?
fn likes(b: Bands) -> Option<Bands> {
// No, I do not like Sabaton
match b {
Bands::Sabaton => None,
_ => Some(b),
}
}
// Do it djent and do I like it ? the match version !
fn match_likes_djent(b: Bands) -> Option<Bands> {
match does_it_djent(b) {
Some(b) => match likes(b) {
Some(b) => Some(b),
None => None,
},
None => None,
}
}
// Do it djent and do I like it ? the map version !
fn map_likes_djent(b: Bands) -> Option<Option<Bands>> {
does_it_djent(b).map(|b| likes(b))
}
// Do it djents and do I like it ? the and_then version !
fn and_then_likes_djent(b: Bands) -> Option<Bands> {
does_it_djent(b).and_then(|b| likes(b))
}
fn main() {
let aal = Bands::AAL;
match and_then_likes_djent(aal) {
Some(b) => println!("I like {:?} and it djents", b),
None => println!("Hurgh, this band doesn't even djent !"),
}
}
{{< / highlight >}}
On a first try, the basic solution is to use a series of `match` statements
(line 32). With two functions, that's ok, but with 3 or more, this
will be a pain in the ass to read. Searching for a cleaner way of handling
stuff that returns an `Option` I find the associated `map` method. **BUT**
using `map` with something that also return an `Option` leads to (function
definition on line 43) :
**an Option of an Option !**
![Facepalm](https://upload.wikimedia.org/wikipedia/commons/3/3b/Paris_Tuileries_Garden_Facepalm_statue.jpg)
Is everything doomed ? No ! Because there is the god send `and_then` method
(function starting on line 48). Basically, `and_then` ensure that we keep a
"flat" structure and do not add an `Option` wrapping to an already existing
`Option`. _Lesson learned_ : if you have to deal with a lot of `Option`s or
`Result`s, use `and_then`.
Last but not least, I also want to write about the `?` operator for error
handling. Since _Rust_ version 1.13, this new operator removes a lot of
boiler plate and redundant code.
Before 1.13, error handling will probably look like this :
{{< highlight rust >}}
fn read_from_file() -> Result<String, io::Error> {
let f = File::open("sample.txt");
let mut s = String::new();
let mut f = match f {
Ok(f) => f
Err(e) => return Err(e),
};
match f.read_to_string(&mut s) {
Ok(_) => Ok(s),
Err(e) => Err(e),
}
}
{{< / highlight >}}
With 1.13 and later,
{{< highlight rust >}}
fn read_from_file() -> Result<String, io::Error> {
let mut s = String::new();
let mut f = File::open("sample.txt")?;
f.read_to_string(&mut s)?;
Ok(s)
}
{{< / highlight >}}
Nice and clean ! _Rust_ also experiment with a function name `try`, used like
the `?` operator, but chaining functions leads to unreadable and ugly code :
{{< highlight rust >}}
try!(try!(try!(foo()).bar()).baz())
{{< / highlight >}}
To conclude, there is a lot of stuff here, to make code easy to understand
and maintain. Flow control using match and functions combination may seems
odd at the beggining but after some pratice and experiments I find quite
pleasant to use. But there is (again), more, fasten your seatbelt, next
section will blow your mind.
# Ownership, borrowing, lifetimes 🤯
To be clear, the 10 first hours of _Rust_ coding just smash my brains because
of this three concepts. They are quite handy to understand at first, because
they change the way we make and understand programs. With time, pratice and
compiler help, the mist is replaced by a beautiful sunligth. There is plenty
of other blog posts, tutorials and lessons about lifetimes, ownership and
borrowing. I will add my brick to this wall, with my own understanding of it.
Let's start with **ownership**. _Rust_ memory management is base on this
concept. Every resources (variables, objects...) is **own** by a block of
code. At the end of this block, resourses are destroyed. This is the standard
predicatable, reproducible, behavior of _Rust_. For small stuff,
that's easy to understand :
{{< highlight rust "linenos=table, hl_lines=11">}}
fn main() {
// create a block, or scope
{
// resource creation
let i = 42;
println!("{}", i);
// i is destroyed by the compiler, and you have nothing else to do
}
// fail, because i do not exists anymore
println!("{}", i);
}
{{< / highlight >}}
Compiling this piece of code will throw an error :
{{< highlight txt >}}
error[E0425]: cannot find value `i` in this scope
--> src/main.rs:11:20
|
11 | println!("{}", i);
| ^ not found in this scope
{{< / highlight >}}
To remove the error, just delete the line 11.
Ok, cool ! But what if I want to pass a resources to another block or even a
function ?
{{< highlight rust "linenos=table" >}}
fn priprint(val: int) {
println!("{}", val);
}
fn main() {
let i = 42;
priprint(i);
println!("{}", i);
}
{{< / highlight >}}
Here, this piece of code works because _Rust_ copy the value of `i` into
`val` when calling the `priprint` function. All primitve type in _Rust_
works this way, but, if you want to pass, for example, a struct, _Rust_
will **move** the resource to the function. By **moving** a resource, you
**transfer** ownership to the receiver. So in the example below `priprint`
will be responsible of the destruction of the struct passed to it.
{{< highlight rust "linenos=table, hl_lines=12" >}}
struct Number {
value: i32
}
fn priprint(n: Number) {
println!("{}", n.value);
}
fn main() {
let n = Number{ value: 42 };
priprint(n);
println!("{}", n.value);
}
{{< / highlight >}}
When compiling, _Rust_ will not be happy :
{{< highlight txt >}}
error[E0382]: borrow of moved value: `n`
--> src/main.rs:14:20
|
10 | let n = Number{ value: 42 };
| - move occurs because `n` has type `Number`, which does not implement the `Copy` trait
11 |
12 | priprint(n);
| - value moved here
13 |
14 | println!("{}", n.value);
| ^^^^^^^ value borrowed here after move
{{< / highlight >}}
After **ownership** comes **borrowing**. With **borrowing** our _Rust_
program is able to have multiple references or _pointers_. Passing a
reference to another block tells to this block, here is a **borrow** (mutable
or imutable) do what you want with it but do not destroy it at the end of your
scope. To pass references, or **borrows**, add the `&` operator to `priprint`
argument and parameter.
{{< highlight rust "linenos=table, hl_lines=5 12" >}}
struct Number {
value: i32
}
fn priprint(n: &Number) {
println!("{}", n.value);
}
fn main() {
let n = Number{ value: 42 };
priprint(&n);
println!("{}", n.value);
}
{{< / highlight >}}
Seems cool no ? If a friend of mine borrow my car, I hope he will not
return it in pieces.
Now, **lifetimes** ! _Rust_ resources always have a **lifetime** associated
to it. This means that resources the are accessible or "live" from the moment you
declare it and the moment they are dropped. If you're familiar with other
programming languages, think about **extensible scopes**. To me **extensible
scopes** means that **scopes** can be move from one block of code to another. Simple, huh ? But
things get complicated if you add references in the mix. Why ? Because
references also have **lifetime**, and this **lifetime**, called **associated
lifetime**, can be smaller than the **lifetime** pointed by the reference. Can
this **associated lifetime** be longer ? No ! Because we want to access valid
data ! In most cases, _Rust_ compiler is able to guess how **lifetimes** are
related. If not, it will explicitly ask you to annotate you're code with
**lifetimes specifiers**. To dig this subject, a whole article is necessary and
I don't find my self enough confident with **lifetimes** yet to explain it
in details. This is clearly the hardest part when you learning _Rust_. If
you don't understand what you're doing at the beginning, that's not a real problem.
Don't give up, read, try and experiment, the reward worth it.
![No idea](https://i.kym-cdn.com/entries/icons/original/000/008/342/ihave.jpg)
# What's next ? 🔭
Thanks to _Rust_ and my little project, I learned a bunch of new concepts
related to programming.
_Rust_ is a beautiful language. The first time I used it, many years ago, it
was a bit odd to understand. Today, with more programming experiences, I
really understand why it matters. To me 2019, will be the _Rust_ year. A lots
of _Rust_ projects pops up on Github, and that's a good sign of how the
language start to gain in popularity. Backed up with Mozilla and the
community, I really believe that's it's the go to language for the next 10
years. Of course, _Golang_ is also in this spectrum of new generation
laguages but they complement one each other with various ways of thinking and
programming. That's clear to me, I will continue to make _Go_ **AND** _Rust_
programs.
Now, I need to go deeper. On one hand, by adding new features to
**Fediwatcher** I want to experiment around concurrency and how I can
compare it to _Golang_.
On the other hand, I'm really, really interested by **web assembly** and I
think _Rust_ is a good bet to start exploring this new open field. Last but not
least, all this skills will allow me to continue my contributions to
[Plume](https://github.com/Plume-org/Plume), a _Rust_ federated blogging
application, based on ActivityPub.
Let's go^Wrust !
[^1]: I am not a C hardcore programmer anymore, beceause of _Golang_ and _Rust_, of course.
[^2]: Taken and adapted from [Rust documentation](https://doc.rust-lang.org/rust-by-example/flow_control/match.html)
[^3]: Taken from [Rust documentation](https://doc.rust-lang.org/rust-by-example/flow_control/match/destructuring/destructure_structures.html)