A few weeks ago, I was asked four questions all on the same day:
The discussion had so many factors that I made it into a post, which very quickly exploded into a whole series. So here we are!
I love this topic because it's so nuanced: every language has its strengths and weaknesses, and there is no "one true language" that's best in every situation.
We'll mostly be comparing languages' approaches to memory safety, which is the prevention of common memory access bugs such as use-after-free.
Even if you're familiar with memory management, you'll likely learn some interesting things:
There are generally four approaches to memory safety:
There's also a fifth approach, generational references. We'll talk more about that elsewhere, this series is comparing the more traditional approaches.
Note that this is only Part 1. Subscribe to the RSS feed, twitter, or subreddit to watch for the rest!
Memory safety approaches generally influence six aspects of a language:
Different situations will prioritize these aspects differently, and will call for different languages.
Let's dive into the first one!
To what extent does each approach help with memory safety?
This is a surprisingly nuanced topic. It's not a black-and-white thing, approaches can be anywhere on the memory safety spectrum.
Let's talk about MMM first!
Manual memory management by default has no memory safety protections.
If a programmer allocates every object with malloc, 3 and gives it to free when it's last used, 4 the program will be memory safe... in theory.
In practice, it's quite difficult to make a memory-safe program that way.
On top of that, if someone later updates the program, they'll likely violate some implicit assumptions that the original programmer was relying on, and then memory problems ensue.
To make matters a bit worse, programs made this way will be quite slow:
As you can imagine, many successful MMM projects avoid malloc for these reasons.
There is, of course, a much better and safer way to use MMM languages.
But before that, let's be a little more specific: what is memory safety, really?
Memory safety prevents common memory access bugs, including:
These all have one thing in common: they risk accessing the wrong data, triggering undefined behavior ("UB") which can result in cantankerous shenanigans like security vulnerabilities, segmentation faults, or random nearby data changing. 5
But sometimes, accessing the wrong data won't trigger undefined behavior, if the data there is still the type that we expect. 6
So really, the goal of memory safety is to access data that is the type we expect.
This more accurate definition opens the door to a lot more useful, efficient, and safe approaches, as we'll see below.
Note that sometimes, we can trade a memory safety bug for another kind of bug.
For example, if we free'd a ResponseHandler but never unregistered it, the NetworkManager might still have a pointer to it when the response comes, triggering a use-after-free.
The outcome might be different in another paradigm:
These are technically logic bugs, better than undefined behavior, but our work is not done. We still need good engineering discipline, testing, and proper data handling practices no matter what approach we use.
There are ways to drastically reduce the risk of memory safety problems, even when the language doesn't give you any protections itself. It has no official name, so I refer to it as Architected MMM or sometimes MMM++.
There are some basic guidelines to follow:
This is how a lot of embedded, safety-critical, and real-time software works, including many servers, databases, and games. 13
This system mostly solves the aforementioned use-after-type-change bugs. To illustrate:
These are still logic problems, but are no longer memory safety problems, and no longer risk undefined behavior.
Looking at modern MMM languages, this seems to be the direction they're emphasizing and heading toward:
Both languages also have bounds checking by default, and all unions are tagged. 15
The benefit of this approach is that it gets us much closer to memory safety without the particular drawbacks of GC, RC, or borrow checking.
Practices like these have been formalized, and even integrated into static analysis tools like Ada's SPARK. One could even say the borrow checker is such a system, built into the language and enabled everywhere by default.
There are a lot of misconceptions about the safety of programs written in MMM languages.
But with the right tooling, practices, and discipline, one can reduce the risk of memory safety bugs to an acceptable level for their situation.
This is also why we use languages like Rust, even though unsafe blocks can undermine and cause problems in the surrounding safe code.
If one needs absolute safety, there are languages like Pony which have zero memory unsafety and less run-time errors than any other language, and tools like Coq.
But in the real world we often don't need absolute guarantees, and we can use something with sufficient memory safety, whether it uses constructs like unsafe blocks or tools like ASan or memory tagging or CHERI. 16
This is particularly nice because:
So how do we know if we don't need absolute memory safety?
By "garbage collection" I'm specifically referring to tracing garbage collection.
This isn't an actual term in the industry, but I think it captures the spirit nicely.
GC'd languages like Javascript and Lua are safe, and need no escape hatches.
This includes objects that would have been inline in the stack or in other objects.
This might not the place that C++'s unique_ptr frees the object, because that might accidentally not be the last use of the object.
Undefined behavior has also been known to cause computers to grow AIs and become sentient and hostile, probably.
Even this understanding isn't quite accurate. Memory unsafety theoretically can't occur if the memory is reused for a different struct type with the same layout, though in practice today's optimizers do interpret that as UB. If we want to go even further, we'd say that memory unsafety can only occur if we interpret a non-pointer as a pointer.
This is why Higher RAII is so nice, as it helps us remember to unregister handlers like this.
The proper solution to this is to use something like a SlotMap or HashMap that trades some performance for more intelligent reuse of space. In that case, we'd get an Option, and we can either panic, ignore it, or bubble it upward.
One must still make sure that a pointer to an arena-allocated object does not outlive the arena allocator.
A pointer "escapes" if it lives past the end of the object's stack frame.
A "tagged" union is a union that has an integer or an enum traveling alongside it, which keeps track of what the actual type is inside the union. One must always check the tag before accessing the data inside the union.
This means that we never take a pointer to a union, we instead copy it around. We might also only copy the data out of the union before accessing it.
For example, TigerBeetleDB has a similar set of rules.
Also check out Hard Mode Rust to see someone try to do this with completely pre-allocated data.
The creator of Zig is also looking into adding escape analysis, which is pretty exciting.
This also probably sounds odd coming from me, since Vale is completely memory safe. It would be very easy (and convenient) for me to claim that everyone should use my preferred level of memory safety.
However, a real software engineer puts their bias aside, and strives to know when an approach's benefits are worth the costs.
RAII is about automatically affecting the world outside our object. To affect the outside world, the borrow checker often requires us to take a &mut parameter or return a value, but we can't change drop's signature. To see this in action, try to make a handle that automatically removes something from a central collection. Under the hood we usually use unsafe mechanisms, including FFI.
Sometimes, we need memory safety to protect against very real risks:
But for other situations, like many games and apps, the costs and burdens of certain memory safety approaches might not be worth it.
Let's talk more about these risks and when they occur.
Some programs handle untrusted input, such as web servers, certain drivers, etc. An attacker can carefully craft input that takes advantage of UB to gain access to sensitive data or take control of the system. Memory safety helps guard against that.
For example, if working on a server or a multiplayer game, you're handling a lot of untrusted input and you'll want memory safety to help with that.
Another example would be when writing a bluetooth driver. These radio waves could be coming from anywhere, and an attacker could craft an exactly right pattern to cause mischief and mayhem for the user.
In cases like these, we need to be careful and use more memory safe approaches.
However, not all programs handle untrusted input. 18
For example, the Google Earth app is written in a non-memory-safe language but it only takes input from the user and from a trusted first-party server, which reduces the security risk. 19
In cases like those, security doesn't need to be as much of a factor in language choice.
"Untrusted input" can also be in the form of files. But if those files came with the program, such as assets for a game, then they are trusted input and not as much of a problem.
Its sandboxing also helps, whether from webassembly, iOS, or Android.
Some programs reuse memory for multiple users. A use-after-free could mean that your web server could expose a user's private data to another user.
For example, let's say a server receives Bob's SSN from the database, but needs to wait for a second request before sending it all to Bob's phone.
While Bob's SSN is hanging out in RAM, some buggy code handling Jim's request might do a use-after-free and read Bob's SSN, exposing it to Jim.
Memory safety helps by preventing use-after-frees like that.
Note that memory safety does not necessarily solve the problem. Borrow checking can turn memory safety problems into privacy problems, and the same can be true of MMM approaches. 20 No approach is perfect, but GC and RC seem to be the most resilient here.
However, not all programs handle data for multiple users.
For example, Shattered Pixel Dungeon 21 is a mobile roguelike RPG game that just stores high scores and save files for a single user.
In cases like these, privacy doesn't need to be as much of a factor in language choice.
Generational indices, memory tagging, and CHERI can help with this drawback.
This game is amazing, it's open source, and I'm a proud sponsor!
Some programs have safety critical code, where a bug can physically harm a user. The Therac-25 had a bug that dosed six patients with too much radiation. One should definitely use a memory safe language for these cases.
However, most programmers aren't writing safety-critial code. My entire career has been on servers, apps, and games, and I generally don't connect them to anything explosive, incendiary, or toxic to humans.
Sometimes, memory unsafety bugs aren't as bad as all that.
For example:
Bugs like these are generally as severe as logic problems, and we can use less burdensome techniques to detect and resolve them: tooling like ASan, Valgrind, release-safe mode, memory tagging, CHERI, etc. They aren't perfect, but they're very effective. We'll talk about these more below.
So what are these tools, and how might they help us easily improve our memory safety?
The easiest way to detect most memory safety bugs is to use tools like ASan, memory tagging, valgrind, etc. These are usually turned off in production, but we turn them on in:
The Google Earth folks used these pretty religiously. It might be surprising to hear, but the vast majority of memory safety bugs were caught in development and automated tests by Address Sanitizer. 22
In an average Google Earth quarter, they would get perhaps 60-80 bug reports, and memory unsafety was the root cause of only 3-5% of them. That's how effective Address Sanitizer can be.
On more modern hardware, you can also compile MMM languages with CHERI.
CHERI works by bundling a 64-bit "capability" with every pointer, thus making every pointer effectively 128 bits. When we try to dereference the pointer, the CPU will check that the capability is correct, to help with memory safety.
If you want to call into a library written in an MMM language, then you might benefit from using wasm2c, for a modest performance cost (14% with all the platform-specific mechanisms enabled).
Note that there can still be memory corruption inside the sandbox, which may or may not be an acceptable risk for the situation.
Memory tagging is a technique that takes advantage of how pointers and addresses work on modern operating systems.
A pointer is 64 bits, which means we theoretically have 2^64 bytes of address space. In reality, operating systems only use 48 to 56 bits of that, and don't use the other bits for addressing.
Memory tagging will generate a random 4-bit number for every chunk of memory. Whenever we create a pointer to that memory, it will put that 4-bit number into the top unused bits of the pointer. Later, when we try to dereference the pointer, it will check that those 4 bits still match the original 4 bits of the object. If they're different, that means the object has been freed already, and it will halt the program.
This is particularly good for debugging and testing. If this is enabled for your integration tests, then any invalid access bug has a 94% chance of being caught. 23
That pretty much covers the various approaches one can use with MMM, and to what extent they help with memory safety.
...said my friend when he saw how long this post was! It was already 45 pages and growing, so he had me cut it off here at 11. 24
In the next posts, we talk about:
And at the very end, we'll have a comprehensive answer for when to use which approaches.
Thanks for reading! I hope this post has been intriguing and enlightening.
In the coming weeks I'll be continuing this series, so subscribe to the RSS feed, twitter, or the subreddit, and come hang out in the discord server!
They didn't even use shared_ptr, they mostly used unique_ptr and raw pointers.
And that chance increases to 99.6% if you run your integration tests twice, and so on!
And I haven't even covered the more interesting tools like ReleaseSafe mode, UBSan, or the various temporal memory safety approaches! But we've covered the basics.