Skip to main content

Rust - Compile-Time Memory Safety

· 7 min read

In this post, I will explain what makes Rust interesting by drawing an analogy between dynamic vs. static typing and the approaches to memory safety in C++ and Rust, without going into too much detail.

Static typing prevents type errors at compile time. For example:

  • Python

    def square(x):
    return x * x

    square("5")
    # Runtime error: Can't multiply sequence by non-int of type 'str'
  • C++

    int square(int x) {
    return x * x;
    }

    square("5");
    // Compile error: Invalid conversion from ‘const char*’ to ‘int’

Static typing has the following benefits (taken from Guido van Rossum's Stanford seminar):

  • Catches (certain) bugs earlier
  • Refactor with confidence
  • Helps human readers navigate large codebases
  • Better than (certain) comments: the compiler keeps you honest

In fact, all popular dynamic languages have static typing projects, often backed by big corporations, as the benefits of static typing become more significant for larger projects.

Preventing Memory Errors at Compile Time

Since memory safety is a major practical issue in C++, it would be great if we could check for memory errors statically, in the same way that static typing checks for type errors.

Indeed, this was one of the main motivations behind Rust's creation. Just as a C++ compiler tracks type information for each variable, the Rust compiler also tracks ownership, lifetime, and aliasing for each variable.

Here is a small list of memory issues that can be statically verified with Rust.

Using an Uninitialized Variable

  • C++

    int x;
    int y = square(x);
    // Passing a garbage value at runtime.
  • Rust

    let x: i32;
    let y = square(x);
    // Compile error
    // error[E0381]: use of possibly uninitialized variable: `x`
    // |
    // | let y = square(x);
    // | ^ use of possibly uninitialized `x`

Invalid Memory Access

  • C++

    int* x = (int*)1234;
    *x = 5;
    // Invalid memory access at runtime.
    // Segmentation fault (core dumped)
  • Rust

    let x = 1234 as *mut i32;
    *x = 5;
    // Compile error
    // error[E0133]: dereference of raw pointer is unsafe and requires unsafe function or block
    // |
    // | *x = 5;
    // | ^^^^^^ dereference of raw pointer
    // |
    // = note: raw pointers may be NULL, dangling or unaligned; they can violate aliasing rules and cause data races: all of these are undefined behavior

Dangling Pointer/Variable

  • C++

    #include <string>
    #include <string_view>

    std::string_view get_extension(std::string filename) {
    return filename.substr(filename.find_last_of('.') + 1);
    // Returning a dangling std::string_view at runtime.
    }
  • Rust

    fn get_extension(filename: String) -> &'static str {
    return &filename[filename.rfind('.').unwrap() + 1..];
    // Compile error
    // error[E0515]: cannot return value referencing function parameter `filename`
    // |
    // | return &filename[filename.rfind('.').unwrap()+1..];
    // | ^--------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    // | ||
    // | |`filename` is borrowed here
    // | returns a value referencing data owned by the current function
    }

Incorrectly Using a Moved Object

  • C++

    #include <vector>

    void process(std::vector<int> v);

    // ...
    std::vector<int> x = {1, 2, 3};
    process(std::move(x));
    x.push_back(4);
    // Using an object in an unspecified state at runtime.
  • Rust

    fn process(v: Vec<i32>);

    // ...
    let mut x = vec![1, 2, 3];
    process(x);
    x.push(4);
    // Compile error
    // error[E0382]: borrow of moved value: `x`
    // |
    // | let mut x = vec![1, 2, 3];
    // | ----- move occurs because `x` has type `std::vec::Vec<i32>`, which does not implement the `Copy` trait
    // | process(x);
    // | - value moved here
    // | x.push(4);
    // | ^ value borrowed here after move

Data Race in Multithreading

  • C++

    #include <iostream>
    #include <thread>
    #include <vector>

    static int MONEY = 0;

    void deposit_money(int amount) {
    for (int i = 0; i < amount; ++i)
    ++MONEY;
    // A data race occurs at runtime. Some increments can be lost.
    }

    int main() {
    std::vector<std::thread> threads;

    for(int i = 0; i < 100; ++i)
    threads.emplace_back(deposit_money, 10000);

    for(int i = 0; i < 100; ++i)
    threads[i].join();

    // The result might not be 1,000,000 due to the data race.
    std::cout << MONEY;
    }
  • Rust

    static mut MONEY: i32 = 0;

    fn deposit_money(amount: i32) {
    for _ in 0..amount {
    MONEY += 1;
    // Compile error
    // error[E0133]: use of mutable static is unsafe and requires unsafe function or block
    // |
    // | MONEY += 1;
    // | ^^^^^^^^^^ use of mutable static
    // |
    // = note: mutable statics can be mutated by multiple threads: aliasing violations or data races will cause undefined behavior
    }
    }

    fn main() {
    let mut threads = vec![];

    for _ in 0..100 {
    let thread = std::thread::spawn(|| deposit_money(10000));
    threads.push(thread);
    }

    for thread in threads {
    let _ = thread.join();
    }

    unsafe {
    println!("{}", MONEY);
    }
    }

To make these static memory checks possible, Rust enforces that you can have either a single mutable reference or multiple read-only references at a time, but not both. In fact, these are very good idioms for structuring large codebases anyway, and they normally do not get in the way of writing ordinary applications. For libraries that require fine-grained memory control, like data containers (e.g., vectors, lists, and hash maps), the unsafe keyword is available to bypass these restrictions.

To be fair, there are compiler options and external tools that can detect C++ memory issues, but they are nowhere near as complete as Rust's built-in checks due to implementation complexity and inherent limitations in the C++ language specification.

  • -Wall -Wextra compiler options: Even for the trivial examples above, GCC 8.3 and Clang 8.0 could only detect one of the five cases: the use of an uninitialized variable.
  • External tools (e.g., Valgrind, Address/Memory/Thread Sanitizers): These are great tools. However, in practice, there is a big difference between compile-time and run-time detection. Run-time checks are limited to the specific code paths your tests execute. If that were sufficient, one could argue there would be no need for static typing, as tests could also be used to catch type errors.

How Rust Is Received

Rust has been consistently ranked #1 in the Stack Overflow Developer Survey's "most loved" programming languages category for four years in a row, followed by Python #2, TypeScript #3, and Kotlin #4 in 2019.

It has also received favorable comments from some of the most highly regarded C/C++ programmers:

Rust in Production

Conclusion

This is just one example of why Rust is compelling, and there are many other things that Rust gets right. Hopefully, this post was interesting enough to encourage you to read more about Rust!

References