Chasing the Myth of Zero-Overhead Memory Safety (plus pictures of mythical birds!)

June 19, 2023 — Evan Ovadia

In my recent article, I showed how Vale's linear types can provide memory safety without a borrow checker, generations, reference counting, or tracing GC.

My friend Mike asked, "So linear types give zero-overhead memory safety, like borrow checking?"

"Perhaps," I said. "What do you mean by zero-overhead memory safety? Do you mean like zero-cost abstractions?" 0

"No, I mean no memory safety overhead, since it has no RC, GC, or generations." he said.

A really interesting conversation ensued, with a surprising conclusion: zero-overhead memory safety doesn't actually exist, and never has. In other words, any program in any language will have some run-time cost to maintain memory safety. 1

This will come as a surprise to a lot of people, who believe various approaches have no memory safety overhead.

Let's embark on a little expedition, to find a language with zero-overhead memory safety!

Arrrlang, a pirate-themed language with zero-overhead memory safety

I once made a toy language 2 named Arrrlang that stored everything in type-specific global arrays.

The most important thing to know about this language was that, yes, its mascot was a parrot.

The parrot's name is a subject of scholarly debate to this day. 3

(the more mythical birds are below)

In Arrrlang, everything lived in a global array.

Every Ship was stored in a global ships array.
Every Engine was stored in a global engines array.

...and so on.

In Arrrlang, there were no pointers!

If a Ship wanted to point to its Engine, it would instead index into the global engines array.

struct Ship {
  engineIndex int;
}

Everything was indexing into everything else. It actually felt like using a relational database.

With these basic rules, Arrrlang was memory safe. 4

Side Notes

(interesting tangential thoughts)

Notes [–] Notes [+] 0 1 2 3 4

0

Memory safety being a zero-overhead abstraction means that if you're not using a certain method of memory safety, you're not paying any costs for the option.

1

Though, one can theoretically contrive a simple program that just does a return 0; which has no cost. We're talking about real-world programs here.

2

"Toy language" just means a small language, usually made as a fun hobby. I once also made a toy language for generating an in-memory time-traveling database for my game Shattered Forest.

3

Some people think it's name is just Perry, but there are dubious notes predating that which suggests that its full name was Zeddicus Zu'l Zorander the Second.

4

Some details:

Instead of free()ing memory, we just add its index to a free-list.
Using an old index to a now-released element would be memory safe, as we'd still access something of the expected shape.

Using an old index has some risks: accessing an unexpected live element or old data can cause logic bugs. I wouldn't call Arrrlang "safe", but it was at least memory safe.

Arrrlang's Memory Safety Costs

Arrrlang had no reference counting, no tracing garbage collection, and no generations. So by Mike's definition, Arrrlang had zero-overhead memory safety.

But of course, it had plenty of costs.

Its memory usage kept growing over time.
It did a bounds check on every "dereference".
It stored released elements in a free-list, which has some costly cache misses.

It might be tempting to think that these are inevitable costs, fundamental to any form of managing memory.

However, tracing garbage collection doesn't have these costs. 5 It has other costs, but the point still stands.

So it seems this zero-overhead memory-safe language actually still has some run-time overhead in practice.

If we're looking for a language with zero-overhead memory safety, our expedition apparently isn't over.

It seems zero-overhead memory safety is as elusive as the
ancient Roc, which was known to carry off entire elephants!

Notes [–] Notes [+] 5

5

Specifically:

GC can move objects around, to release more memory to the OS.
GC's references were just references, and references need no bounds checking.
GCs can allocate from a bump allocator, which is very cache-friendly.

Rust, Bounds Checking, Hashing, Cloning

Rust is in the top three favorites for both me and Mike, so it came up pretty quickly in our conversation.

The first thought is, of course, that safe Rust has bounds checking too, which are an obvious run-time cost (anywhere between 1-15%, usually on the lower side of that).

But that's unavoidable; all safe languages have bounds checking! 6 Let's steelman the question and instead ask, "Does safe Rust have more bounds checking costs than other languages?" 7

Sometimes, it's less. It's not about eliding bounds checks in loops (which really just moves the bounds check to the i < len loop condition), but when combined with loop unrolling it can skip quite a few, and that's a big point in Rust's favor.

But there are cases where it has more bounds checking.

When the borrow checker denies two simultaneous &mut references to an object, we often make one into an index into a central Vec or HashMap, which have extra bounds checking or hashing.

To see this in action, try implementing a doubly linked list, observers, delegates, intrusive data structures, subscriptions, back-references, graphs, or any other kind of interconnected data. 8

Or, when we want to read from an array of structs and write to a different field in that same array, we often "flatten" it into two arrays (with twice the bounds checking), similar to a column-oriented database or an ECS approach. 9 10

Another common pattern is to clone more often with .clone(), often to get around the borrow checker. This is cheap for primitives, but can be expensive for types like String and Vec that involve heap allocations. 11 With practice this lessens, but it never goes away completely.

This additional cloning, hashing, and bounds checking are all consequences of having a borrow checker.

This isn't a surprise to anyone who knows Rust well. Rust never promised zero-overhead memory safety. Rather, it lets you explicitly choose which memory-safety overhead you incur. This is known as a "zero-cost abstraction", a phrase that I suspect my friend Mike may have misunderstood.

But alas, we still need to answer his original question. It seems that Rust is another "zero-overhead" language that actually has some overhead. Our expedition isn't over!

Chasing zero-overhead memory safety is as confounding as the
elusive Eagle Snipe, which has evaded even the finest hunters!

Notes [–] Notes [+] 6 7 8 9 10 11

6

Some techniques such as dependent types can help a compiler avoid bounds checks, though they really just move the overhead elsewhere, for example by requiring a lot of if-statements to maintain a certain range for an integer that we later use for indexing into an array.

7

We can also skip bounds checks in Rust with get_unchecked etc, which can be a great tool when you can prove a different way that the index in bounds. For this article, I'll be mainly talking about safe Rust.

8

As Jonathan Corbet found, intrusive data structures are nigh impossible in Rust.

9

This is one reason why the borrow checker likes ECS so much.

10

Another good solution for the same array-of-structs problem is to generate an "effect struct" containing data describing some changes that should happen later when we have an exclusive &mut again. However, this effect struct can often require clones or heap allocations.

11

In the words of LAC-Tech:

"I've improved but I still have a lot of "wtf" moments. End up doing a lot of deep copying and heap allocations just to shut the compiler up, which makes me question just how "zero cost" the safety is."

Vale's Linear Types

In my linear types post, I talk about how Vale's overhead comes in the form of generation checks, but we can reduce them to zero with linear types and regions.

After using this "linear style" for a while, I realized that it feels really similar to using a borrow checker. That makes sense, as they're both a kind of substructural type system.

And indeed, it had a lot of the same consequences as borrow checking. Instead of aliasing, we would use more indexes and IDs into central arrays and hash maps. I also ended up with more cloning, just to get around this linear style's restrictions.

So even though this linear style Vale meets Mike's definition of a zero-overhead memory-safe language, it also still has overhead, just like Rust.

It seems like there's a lower bound to memory safety overhead. Even the most zero-overhead approaches have some overhead. Why is that?

It seems that zero-overhead memory safety is as elusive as
the Articuno, which was known to freeze entire towns!

The lower bound is above zero

Someone once challenged me to make a zero-overhead observer in Vale or Rust. It turns out, it still had overhead.

For example, we tried having a central "subscriptions" collection that tracked which objects were interested in which events from which sources.

There were a few different flavors:

An array. Alas, we had to do a whole linear-time loop to find all subscriptions for a certain source. Making it an array of arrays didn't help, it just introduced an extra cache miss. 12
A hash map from source+event to an array of receivers. This had some extra hashing and an extra cache miss compared to an observers approach.

The fastest approach in Vale was to just do a regular observer. Just using some generational references was faster than the "zero-overhead" linear style. 13

I think this is indicative of some underlying rule. If I had to guess, it's that non-temporary many-to-one and many-to-many relationships have inherent run-time memory-safety overhead.

These relationships are inherent to most programs, so most programs will have some memory-safety overhead.

Notes [–] Notes [+] 12 13

12

A "cache miss" is when the CPU requests memory from a 64B chunk that's not already in the CPU cache, and it has to wait idle for hundreds of cycles waiting for the data to come in from RAM.

13

I once saw a consensus on the Rust discord server to always avoid Rc'd observers and instead use Yew.

As much as I like Yew, it's worrying when people advise bringing in an entire front-end framework with hundreds of dependencies just to avoid Rc.

Some methods are better than others

In this little journey, we've come across six kinds of overhead:

Reference counting
Tracing
Bounds checking
Cloning
Hashing
Generation checks

These costs are not all equal, though. The first two are much more expensive:

For reference counting, when we make a new reference to an object, we need to reach all the way into that object to increment its reference count number. This is often a cache miss.
For tracing garbage collection, we need to crawl through all "object roots" and following their member references to other objects, and follow their member references to other objects, and so on to find all the live objects to keep alive past the next collection. This involves a lot of cache misses.

A cache miss is when the CPU requests memory from a 64B chunk ("cache line") that's not already in the CPU cache, and it has to wait idle for hundreds of cycles waiting for the data to come all the way from RAM.

The latter four don't have as many cache misses, they're generally much faster than reference counting and tracing garbage collection.

Though, they still have their costs:

Bounds checking is likely on the lower side of 1-15%. My basic cellular automata program showed 18%. An interpreter, a rather extreme case, showed 74% overhead.
Cloning can be expensive if we're cloning a string or a vector, which involves some heap allocation and its cache misses.
Hash maps could be expensive depending on the hashing algorithms, such as ones that use modulus.
Generational references access the object's field in parallel with its generation which is likely on the same cache line, but it can be expensive in tight loops unless one uses linear style.

To get good performance, one has to pick their poison and decide what overhead they're willing to suffer. There's no avoiding overhead completely.

Often, a particular situation will cause one method to be faster or slower than usual. For example:

Cloning's malloc is slower than how most GCs allocate memory. In cases with a lot of cloning, or where collections can reasonably be avoided, 14 garbage collection can come out ahead.
Bounds checking and generational references are both perfectly branch-predicted, 15 but in rare cases (like matrix multiplication or interpreters) that aren't bottlenecked on cache-misses, these extra instructions can be costly.
Same story with hash maps, one can easily spend most their time hashing. In sufficiently random-access situations, reference counting can be faster.

So even though some overhead is inevitable, we can still think about which method is best for a particular situation. 16

Notes [–] Notes [+] 14 15 16

14

Easier said than done, but it is possible. Pony's ORCA system and Verona's regions are both separately-collected realms of memory. If you destroy them before the first collection, you get the best of all worlds. I suspect this will be a very nice technique for temporary calculations.

15

This means the compiler can instruct the CPU to tentatively proceed executing as if the check will pass.

16

This is why I like designs like the Cone language. It gives you access to whatever memory management method you might like.

What does it all mean?

There's no way to get to zero-overhead memory safety.

However, we can get close.

For language designers, it means that the search isn't over. Since there is still overhead, there is still room to improve or rearrange.

Perhaps it means that we should keep chasing the myth of zero-overhead memory safety, because we aren't there yet.

That's what I'm trying to do with Vale, and led to discovering its linear-aliasing model, generational references, and regions.

A lot of other languages are exploring these areas too, like Austral, Val, and Verona.

Or perhaps it means that we can stop chasing the myth.

Back when I coded in Rust a lot more, I made the mistake of caring too much about avoiding heap allocations, Rc, RefCell, virtual calls, etc. It was idiomatic to avoid these particular sources of overhead. I took a sense of pride in going the extra mile and spending the extra time reducing my overhead down to "zero".

In retrospect, I never actually reached zero, because instead I used more cloning, more bounds checking, and more hashing. It got a bit faster, but I was chasing a definition of perfection that was arbitrary and unreachable. I wasted a lot of time past the point of diminishing returns, chasing that myth.

It wasn't a problem with Rust. It was a problem with me. Perhaps instead of taking a community's idioms as gospel, we should always ask when idioms should be followed, and trust our own profiling and experience.

Or perhaps it means that we can relax a bit about reducing our overhead to zero, since we can never get there. Perhaps we should instead spend our time making our code simple and flexible, optimize where necessary, and actually solve people's problems instead of playing arbitrary overhead golf. It's a principle I knew, but didn't follow as much as I should have.

Solve people's problems and let your pragmatism be reborn,
like the Phoenix which has burned entire towns alive!

Or perhaps it means, since memory safety often has an unavoidable cost, that it can be okay to use unsafe, C, C++, or Zig in situations where memory safety isn't as important as optimal performance or flexibility. Blasphemy, I know. 17

What does it all mean? I'm not sure! There's a lot of lessons we could take away from the unreachability of zero-overhead memory safety.

Notes [–] Notes [+] 17

17

Even though I made a memory safe language, I stand by this blasphemy. There are cases where memory safety is merely a priority, not the priority.

I talked about this at Handmade Seattle, and this article talks about cases like AAA games, sandboxed apps, and client-side programs that always talk to a trusted first-party server.

I worked on Google Earth (written in C++) and memory safety wasn't nearly as big of a problem there as absolutists would have us believe.

Conclusion

My conversation with Mike had a lot more:

How compile-time complexity has an optimization opportunity cost
How some kinds of memory management lend themselves better to concurrency
How memory-safety can be a zero-cost abstraction, even if it isn't zero-overhead.
And of course, more scholarly debate on the Arrrlang mascot's real name. 18

...but these are topics worthy of their own entire posts! So I'll stop here.

That's all for now!

Next week I'll be writing about how we can use these techniques (and a few others) to design some memory safety for C++, so keep an eye on the RSS feed, twitter, discord server, or subreddit!

If you enjoyed this article, please consider donating via patreon or GitHub!

See you next time,

- Evan Ovadia

Bonus: The Quetzalcoatl! Not quite a bird,
but very much a badass winged serpent!

Notes [–] Notes [+] 18

18

It's Zeddicus Zu'l Zorander the Second, and I will die on this hill.