(A Few) Advanced Variable Types in Rust

Get a firm grasp of each of these smart pointers and other advanced variables in Rust: Box, Cell, RefCell, Rc, Arc, RwLock, Mutex, OnceCell (and there are others!)

Programmer on laptop, presumably trying to figure out how to get access to his variable in several threads of his Rust program.
Keep one eye on your code at all times!

“I haven’t seen Evil Dead II yet”. Much is made about this simple question in the movie adaption of High Fidelity. Does “yet” mean the person does, indeed, intend to see the film? Jack Black’s character is having real trouble with the concept – not only does he know that the speaker, John Cusack’s character, has seen Evil Dead II, but what idiot wouldn’t see it, “because it’s a brilliant film. It’s so funny, and violent, and the soundtrack kicks so much ass.” I love this exchange, but I’m a fan of the film anyway. It is not always clear to me how to handle advanced variable types Rust, yet.

I think of these as wrappers that add abilities (and restrictions) to a variable. They give a variable super powers since the Rust compiler is so strict about what you can and can’t do with variables.


Box<T>

PROVIDES:
Smart pointer that forces your variable’s value to be stored on the heap instead of the stack. The Box<> variable itself is just a pointer so its size is obvious and can, itself, be stored on the stack.

RESTRICTIONS:

USEFUL WHEN:
If the size of an item cannot be determined at compile time it will complain if the default is to store it on the stack (where a calculable size is necessary). Using Box<> will force the storage on the heap where a fixed size is not needed. For example, a recursive data-structure, including enums, will not work on the stack because a concrete size cannot be calculated. Turning the recursive field into a Box<> means it stores a pointer which CAN be sized. The example in the docs being:

enum List<T> {
    Cons(T, Box<List<T>>),
    Nil,
}

Also useful if you have a very large-sized T, and want to transfer ownership of that variable without it being copied each time.

NOTABLY PROVIDES:
just see the rust-lang.org docs

EXAMPLES/DISCUSSION:
https://doc.rust-lang.org/stable/rust-by-example/std/box.html
https://www.koderhq.com/tutorial/rust/smart-pointer/
https://manishearth.github.io/blog/2017/01/10/rust-tidbits-box-is-special/

Setting the value of a simple Box<> variable is easy enough and getting the
value back looks very normal:

fn main() {
    let answer = Box::new(42);
    println!("The answer is : {}", answer);
}



Cell<T>

PROVIDES:
You can have multiple, shared references to the Cell<>(and thus, access to the value inside with .get()) and yet still mutate the value inside (with .set()). This is called interior mutability because the value inside can be changed but mut on the Cell<> itself is not needed. The inner value can only be set by calling a method on the Cell<>.

RESTRICTIONS:
It is not possible to get a reference to what is inside the Cell, only a copy of the value. Also, Cell does not implement sync, so it cannot be given to a different thread, which ensures safety.

USEFUL WHEN:
Usually used for small values, such as counters or flags, where you need multiple shared references to the value AND be allowed to mutate it at the same time, in a guaranteed safe way.

NOTABLY PROVIDES:
.set() to set the value inside
.get() to get a copy of the value inside
.take() to get a copy of the value inside AND reset the value inside to default.
see the rust-lang.org docs

EXAMPLES/DISCUSSION:
https://hub.packtpub.com/shared-pointers-in-rust-challenges-solutions/
https://ricardomartins.cc/2016/06/08/interior-mutability

Setting the inner value of a Cell<> is only possible with a method call which is how it maintains safety:

use std::cell::Cell;
fn main() {
    let answer = Cell::new(0);
    answer.set(42);
    println!("The answer is : {}", answer.get());
}

RefCell<T>

PROVIDES:
RefCell<> is very similar to Cell<> except it adds borrow checking, but at run-time instead of compile time! This means, unlike Cell<>, it is possible to write RefCell<> code which will panic!(). You borrow() a ref to the inner value for read-only or borrow_mut() in order to change it.

RESTRICTIONS:
borrow() will panic if a borrow_mut() is in place, and borrow_mut() will panic if either type is in place.

USEFUL WHEN:

NOTABLY PROVIDES:
.borrow() to get a copy of the value at the ref
.borrow_mut() to set the value at the ref
.try_borrow() and .try_borrow_mut() will return a Result<> or error instead of a panic!().
see the rust-lang.org docs

EXAMPLES/DISCUSSION:
https://ricardomartins.cc/2016/06/08/interior-mutability (again)

You must successfully borrow_mut() the RefCell<> in order to set the value (by dereferencing) and then simply borrow() it to retrieve the value:

use std::cell::RefCell;
fn main() {
    let answer = RefCell::new(0);
    *answer.borrow_mut() = 42;
    println!("The answer is : {}", answer.borrow());
}

whereas, something as simple as this compiles, but panics at run-time. Imagine how much more obscure this code could be. Remember, any number of read-only references or exactly 1 read-write reference and nothing else – although for RefCell, this is enforced at run-time:

use std::cell::RefCell;
fn main() {
    let answer = RefCell::new(0);
    let break_things = answer.borrow_mut();
    println!("The initial value is : {}", *break_things);
    *answer.borrow_mut() = 42;
    println!("The answer is : {}", answer.borrow());
}

Rc<T>

PROVIDES:
Adds the feature of run-time reference counting to your variable, but this is the simple, lower-cost version – it is not thread safe.

RESTRICTIONS:
Right from the docs “you cannot generally obtain a mutable reference to something inside an Rc. If you need mutability, put a Cell or RefCell inside the Rc“. So while there is a get_mut() method, it’s easy to just use a Cell<> inside.

USEFUL WHEN:
You need run-time reference counting of a variable so it hangs around until the last reference of it is gone.

NOTABLY PROVIDES:
.clone() – get a new copy of the pointer to the same value, upping the reference count by 1.
see the rust-lang.org docs

EXAMPLES/DISCUSSION:
https://blog.sentry.io/2018/04/05/you-cant-rust-that#refcounts-are-not-dirty

Note that in the example below, my_answer is still pointing to valid memory even when correct_answer is dropped, because the Rc<> had an internal count of “2” and drops it to “1”, leaving the storage of “42” still valid.

use std::rc::Rc;
fn main() {
    let correct_answer = Rc::new(42);
    let my_answer = Rc::clone(&correct_answer);

    println!("The correct answer is : {}", correct_answer);
    drop(correct_answer);

    println!("And you got : {}", my_answer);
}

Arc<T>

PROVIDES:
Arc<> is an atomic reference counter, very similar to Rc<> above but thread-safe.

RESTRICTIONS:
More expensive than Rc<>. Also note, the <T> you store must have the Send and Sync traits. So an Arc<RefCell<T>> will not work because RefCell<> is not Sync.

USEFUL WHEN:
Same as Rc<>, You need run-time reference counting of a variable so it hangs around until the last reference of it is gone, but safe across threads as long as the inner <T> is.

NOTABLY PROVIDES:
see the rust-lang.org docs

EXAMPLES/DISCUSSION:
https://medium.com/@DylanKerler1/how-arc-works-in-rust-b06192acd0a6

Same idea as with Rc<>, we just show it working across multiple threads (and then sleep for just 10ms to let those threads finish).

use std::sync::Arc;
use std::thread;
use std::time::Duration;
fn main() {
    let answer = Arc::new(42);

    for threadno in 0..5 {
        let answer = Arc::clone(&answer);
        thread::spawn(move || {
            println!("Thread {}, answer is: {}", threadno + 1, answer);
        });
    }
    let ten_ms = Duration::from_millis(10);
    thread::sleep(ten_ms);
}



Mutex<T>

PROVIDES:
Mutual exclusion lock protecting shared data, even across threads.

RESTRICTIONS:
Any thread which panics will “poison” the Mutex<> and make it inaccessible to all threads. The T stored must allow Send but Sync is not necessary.

USEFUL WHEN:
working on it!

NOTABLY PROVIDES:
see the rust-lang.org docs

EXAMPLES/DISCUSSION:
https://doc.rust-lang.org/book/ch16-03-shared-state.html

use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
fn main() {
    let answer = Arc::new(Mutex::new(42));

    for thread_no in 0..5 {
        let changer = Arc::clone(&answer);
        thread::spawn(move || {
            let mut changer = changer.lock().unwrap();
            println!("Setting answer to thread_no: {}", thread_no + 1,);
            *changer = thread_no + 1;
        });
    }
    let ten_ms = Duration::from_millis(10);
    thread::sleep(ten_ms);

    if answer.is_poisoned() {
        println!("Mutex was poisoned :(");
    } else {
        println!("Mutex survived :)");
        let final_answer = answer.lock().unwrap();
        println!("Ended with answer: {}", final_answer);
    }
}

RwLock<T>

PROVIDES:
Similar to RefCell, but thread safe. borrow() is read(), borrow_mut is write(). They don’t return an option, they will block until they do get the lock.

RESTRICTIONS:
Any thread which panics while a write lock is in place will “poison” the RwLock<> and make it inaccessible to all threads. A panic! during a read lock does not poison the RwLock. The T stored must allow both Send and Sync.

USEFUL WHEN:
working on it!

NOTABLY PROVIDES:
see the rust-lang.org docs

EXAMPLES/DISCUSSION:

Slightly fancier example, that shows getting both read() and write() locks on the value. If nothing panics, we should see the answer at the end.

use std::sync::{Arc, RwLock};
use std::thread;
use std::time::Duration;
fn main() {
    let answer = Arc::new(RwLock::new(42));

    for thread_no in 0..5 {
        if thread_no % 2 == 1 {
            let changer = Arc::clone(&answer);
            thread::spawn(move || {
                let mut changer = changer.write().unwrap();
                println!("Setting answer to thread_no: {}", thread_no + 1,);
                *changer = thread_no + 1;
            });
        } else {
            let reader = Arc::clone(&answer);
            thread::spawn(move || {
                let reader = reader.read().unwrap();
                println!(
                    "Checking  answer in thread_no: {}, value is {}",
                    thread_no + 1,
                    *reader
                );
            });
        }
    }
    let ten_ms = Duration::from_millis(10);
    thread::sleep(ten_ms);

    if answer.is_poisoned() {
        println!("Mutex was poisoned :(");
    } else {
        println!("Mutex survived :)");
        let final_answer = answer.read().unwrap();
        println!("Ended with answer: {}", final_answer);
    }
}
Checking answer in thread_no: 1, value is 42
Checking answer in thread_no: 3, value is 42
Setting answer to thread_no: 2
Checking answer in thread_no: 5, value is 2
Setting answer to thread_no: 4
Mutex survived :)
Ended with answer: 4

Summary

There are more, plus many custom types, some I’ve even used like the crate once_cell. I started using that for the web app I was (am?) working on and wrote a little about it. Also, as you saw in the last two examples, you can combine types when you need multiple functionalities. I have included these examples in a GitHub repo, pointers.

I’ll probably hear about or (much more slowly) learn about mistakes I’ve made in wording here or come up with much better examples and excuses for using these various types, so I’ll try to update this post as I do. I see using this myself as a reference until I am really familiar with each of these types. Obviously, any mistakes here are mine alone as I learn Rust and not from any of the links or sources I listed!

Also, lots of help from 3 YouTubers I’ve been watching – the best examples can been seen as they write code and explain why they need something inside an Rc<> or in a Mutex<>. Check out their streams and watch over their shoulder as they code!!

Piecing Together a Rust Web Application

Part I: I work on a global static Config and on logging

Part of a Series: Designing a Full-Featured WebApp with Rust
Part 1: Piecing Together a Rust Web Application
Part 2: My Next Step in Rust Web Application Dev
Part 3: It’s Not a Web Application Without a Database
Part 4: Better Logging for the Web Application
Part 5: Rust Web App Session Management with AWS
Part 6: OAuth Requests, APIs, Diesel, and Sessions
Part 7: Scraping off the Dust: Redeploy of my Rust web app
Part 8: Giving My App Secrets to the AWS SecretManager

Hrm… how to find the right piece on Crates.io ???

For over a decade, I’ve worked on web apps with Perl, the last several years with Catalyst/Moose/DBIC and a slew of internal abstractions. There are a bunch of features I expect to need in any web app: config files (at different platform levels); structured logging; database ORM; templating; cookie, authentication, and session controls; en/decryption for access secrets; etc. I spent most of the overnight hours playing with piecing together a Rust web application, though I still have much more to do. After struggling for hours for what amounts to 279 lines of Rust code, I decided it was well worth it. I’ll try to explore some of my problems and what I worked out. This might take more than one post, so I don’t put you to sleep.

Global (Static) Application Config

This probably isn’t idiomatic Rust and frowned upon for even more reasons. I’m sure I’ll adapt to Rustic thinking as I learn, but for now, I’m liking this. I like to have a Config struct that is setup at init time and is immutable (music to the Rust compiler’s ears, I’m sure). I battled this with const and lazy_static and numerous other things. Eventually, I settled on a Crate that the author seems to have stumbled into writing/publishing (unless I’m missing some context): OnceCell. Where I was having trouble getting lazy_static to work, once_cell::sync::OnceCell seemed to work for me rather quickly.

Coupled with that, I like the notion of having Config settings initialized by YAML or JSON or TOML files and also able to be overridden in some way – usually environment variables. This path (and an earlier post) took me to the aptly named Config Crate. It does just what I need for pulling settings into a config from various places. I ended up adding the dotenv Crate as well, because something else used it in an example. I’m not sure I’ll keep ALL of these options forever, but it’s in the mix for now. There are, obviously, many ways to allow overrides vs protect the settings on disk, and many ways to decide which platform you are running on besides an ENV variable – and I’m flexible.

EDIT: I just realized that putting <Mutex> on my type means I have to lock() it to read it – and that prevents other functions (and other threads) from reading it. Since I’m ok with CONFIG being immutable, I really don’t need the Mutex, so I dropped it.

My settings.rs module looks like this for now:

use config::{Config, Environment, File};
use dotenv::dotenv;
use once_cell::sync::Lazy;
use serde_derive::Deserialize;
use std::env;

#[derive(Debug, Deserialize)]
pub struct Server {
    pub run_level: String,
}

#[derive(Debug, Deserialize)]
pub struct WebService {
    pub bind_address: String,
    pub bind_port: u16,
}

#[derive(Debug, Deserialize)]
pub struct Settings {
    pub server: Server,
    pub webservice: WebService,
    pub database_url: String,
}

pub static CONFIG: Lazy<Settings> = Lazy::new(|| {
    dotenv().ok();
    let mut config = Config::default();
    let env = env::var("PPS_RUN_MODE").unwrap_or("development".into());

    config
        .merge(File::with_name("conf/default"))
        .unwrap()
        .merge(File::with_name(&format!("conf/{}", env)).required(false))
        .unwrap()
        .merge(File::with_name("conf/local").required(false))
        .unwrap()
        .merge(Environment::with_prefix("PPS"))
        .unwrap();
    match config.try_into() {
        Ok(c) => c,
        Err(e) => panic!("error parsing config files: {}", e),
    }
});

And config files like these examples:

conf/default.toml:

[server]
run_level = "default"

conf/development.toml:

[server]
run_level = "development"

[webservice]
bind_address = "0.0.0.0"
bind_port = 3000

conf/staging.toml:

[server]
run_level = "staging"

[webservice]
bind_address = "0.0.0.0"
bind_port = 3000

and, conf/production.toml:

[server]
run_level = "production"

[webservice]
bind_address = "0.0.0.0"
bind_port = 80



Application Logging

Piecing together a Rust web application includes another area of big concern – logging! Logging can easily become a tremendous burden of bandwidth and storage space, but a single log record might explain a production incident and lead you to a quick fix! Structured logging is great for logging platforms because storing and especially searching can be greatly improved when the message is static and data fields attached to the log record fill in the variable gaps.

To get things going, I started with the Crates log and simple_logger, but I probably will move to slog for the structured logging. The very first line in my main() is to call setup_logging() so if anything breaks on app initialization, we should get a log for it. With CONFIG a global static, this simple function looks like this for now, but soon I will work out specifying the logging level in the settings so it can be verbose for devs, but tamer on production:

pub fn setup_logging() {
    simple_logger::init_with_level(log::Level::Info)
        .expect("Trouble starting simple_logger");
    let run_level = &CONFIG.server.run_level;
    warn!("Running as run_level {}", run_level);
}

We still have the web framework, the database, encryption and more to come. Next up, more single-word Crates: Iron, Rocket and Diesel. I’d love to hear what Rust developers think of this so far – suggestions are welcome. Here’s the repository on Github – you can skip ahead and see what other messes I’ve made.