Rust - Compile-Time Memory Safety
In this post, I will explain what makes Rust interesting by drawing an analogy between dynamic vs. static typing and the approaches to memory safety in C++ and Rust, without going into too much detail.
Static typing prevents type errors at compile time. For example:
-
Python
def square(x):
return x * x
square("5")
# Runtime error: Can't multiply sequence by non-int of type 'str' -
C++
int square(int x) {
return x * x;
}
square("5");
// Compile error: Invalid conversion from ‘const char*’ to ‘int’
Static typing has the following benefits (taken from Guido van Rossum's Stanford seminar):
- Catches (certain) bugs earlier
- Refactor with confidence
- Helps human readers navigate large codebases
- Better than (certain) comments: the compiler keeps you honest
In fact, all popular dynamic languages have static typing projects, often backed by big corporations, as the benefits of static typing become more significant for larger projects.
- Python: PEP 484 Type Hints, Dropbox Mypy
- JavaScript: Microsoft TypeScript, Google Closure, Facebook Flow
- Ruby: Stripe Sorbet
- PHP: Facebook Hack
- Lua: Ravi
Preventing Memory Errors at Compile Time
Since memory safety is a major practical issue in C++, it would be great if we could check for memory errors statically, in the same way that static typing checks for type errors.
Indeed, this was one of the main motivations behind Rust's creation. Just as a C++ compiler tracks type information for each variable, the Rust compiler also tracks ownership, lifetime, and aliasing for each variable.
Here is a small list of memory issues that can be statically verified with Rust.
Using an Uninitialized Variable
-
C++
int x;
int y = square(x);
// Passing a garbage value at runtime. -
Rust
let x: i32;
let y = square(x);
// Compile error
// error[E0381]: use of possibly uninitialized variable: `x`
// |
// | let y = square(x);
// | ^ use of possibly uninitialized `x`
Invalid Memory Access
-
C++
int* x = (int*)1234;
*x = 5;
// Invalid memory access at runtime.
// Segmentation fault (core dumped) -
Rust
let x = 1234 as *mut i32;
*x = 5;
// Compile error
// error[E0133]: dereference of raw pointer is unsafe and requires unsafe function or block
// |
// | *x = 5;
// | ^^^^^^ dereference of raw pointer
// |
// = note: raw pointers may be NULL, dangling or unaligned; they can violate aliasing rules and cause data races: all of these are undefined behavior
Dangling Pointer/Variable
-
C++
#include <string>
#include <string_view>
std::string_view get_extension(std::string filename) {
return filename.substr(filename.find_last_of('.') + 1);
// Returning a dangling std::string_view at runtime.
} -
Rust
fn get_extension(filename: String) -> &'static str {
return &filename[filename.rfind('.').unwrap() + 1..];
// Compile error
// error[E0515]: cannot return value referencing function parameter `filename`
// |
// | return &filename[filename.rfind('.').unwrap()+1..];
// | ^--------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// | ||
// | |`filename` is borrowed here
// | returns a value referencing data owned by the current function
}
Incorrectly Using a Moved Object
-
C++
#include <vector>
void process(std::vector<int> v);
// ...
std::vector<int> x = {1, 2, 3};
process(std::move(x));
x.push_back(4);
// Using an object in an unspecified state at runtime. -
Rust
fn process(v: Vec<i32>);
// ...
let mut x = vec![1, 2, 3];
process(x);
x.push(4);
// Compile error
// error[E0382]: borrow of moved value: `x`
// |
// | let mut x = vec![1, 2, 3];
// | ----- move occurs because `x` has type `std::vec::Vec<i32>`, which does not implement the `Copy` trait
// | process(x);
// | - value moved here
// | x.push(4);
// | ^ value borrowed here after move
Data Race in Multithreading
-
C++
#include <iostream>
#include <thread>
#include <vector>
static int MONEY = 0;
void deposit_money(int amount) {
for (int i = 0; i < amount; ++i)
++MONEY;
// A data race occurs at runtime. Some increments can be lost.
}
int main() {
std::vector<std::thread> threads;
for(int i = 0; i < 100; ++i)
threads.emplace_back(deposit_money, 10000);
for(int i = 0; i < 100; ++i)
threads[i].join();
// The result might not be 1,000,000 due to the data race.
std::cout << MONEY;
} -
Rust
static mut MONEY: i32 = 0;
fn deposit_money(amount: i32) {
for _ in 0..amount {
MONEY += 1;
// Compile error
// error[E0133]: use of mutable static is unsafe and requires unsafe function or block
// |
// | MONEY += 1;
// | ^^^^^^^^^^ use of mutable static
// |
// = note: mutable statics can be mutated by multiple threads: aliasing violations or data races will cause undefined behavior
}
}
fn main() {
let mut threads = vec![];
for _ in 0..100 {
let thread = std::thread::spawn(|| deposit_money(10000));
threads.push(thread);
}
for thread in threads {
let _ = thread.join();
}
unsafe {
println!("{}", MONEY);
}
}
To make these static memory checks possible, Rust enforces that you can have either a single mutable reference or multiple read-only references at a time, but not both. In fact, these are very good idioms for structuring large codebases anyway, and they normally do not get in the way of writing ordinary applications. For libraries that require fine-grained memory control, like data containers (e.g., vectors, lists, and hash maps), the
unsafe keyword is
available to bypass these restrictions.
To be fair, there are compiler options and external tools that can detect C++ memory issues, but they are nowhere near as complete as Rust's built-in checks due to implementation complexity and inherent limitations in the C++ language specification.
-Wall -Wextracompiler options: Even for the trivial examples above, GCC 8.3 and Clang 8.0 could only detect one of the five cases: the use of an uninitialized variable.- External tools (e.g., Valgrind, Address/Memory/Thread Sanitizers): These are great tools. However, in practice, there is a big difference between compile-time and run-time detection. Run-time checks are limited to the specific code paths your tests execute. If that were sufficient, one could argue there would be no need for static typing, as tests could also be used to catch type errors.
How Rust Is Received
Rust has been consistently ranked #1 in the Stack Overflow Developer Survey's "most loved" programming languages category for four years in a row, followed by Python #2, TypeScript #3, and Kotlin #4 in 2019.
It has also received favorable comments from some of the most highly regarded C/C++ programmers:
- John Carmack: "...writing Rust code feels very wholesome."
- Linus Torvalds: "...We've had the system people who used Modula-2 or Ada, and I have to say Rust looks a lot better than either of those two disasters."
- Miguel de Icaza: "...I have been following an OS written entirely in Rust, and it has great idioms."
Rust in Production
- Google's Crosvm (ChromeOS Virtual Machine Manager)
- Facebook's Mercurial server
- Amazon's AWS Firecracker
- Microsoft's Azure IoT Edge
- Red Hat's Stratis storage
- Dropbox's storage optimization engine
- Mozilla's Servo browser engine
- Cloudflare's QUIC protocol implementation
- NPM's authorization service
- Unity's data engineering team
- Twitter's build team
- Reddit's comment processing
Conclusion
This is just one example of why Rust is compelling, and there are many other things that Rust gets right. Hopefully, this post was interesting enough to encourage you to read more about Rust!
