Thoughts On Perfect Memory Safety

Apr 29, 2022 — Evan Ovadia

Regarding generational references' probabilistic behavior of collisions: what exactly is this doing and what are (were) the consequences of a collision that made it worthwhile to do that? 0

A good question!

Recall that whenever we dereference a generational reference, it will do a check to make sure the "remembered generation" matches the "actual generation". 1

Our first approach had "perfect memory safety"; it could be mathematically proven to be safe. When a generation hit INT_MAX, we would retire that chunk of memory forever, instead of overflowing to zero and allowing repeats.

We later realized that if we randomized the generation instead, and allowed repeats, we could unlock some really powerful optimizations. Suddenly, we could put objects directly on the stack and inside other structs and arrays.

Conceptually, it means that generations work like passwords, where every object has a password and every reference remembers it. And like any password-based system, we can be confident in the mechanism's security by making these passwords (generations) unobservable, unpredictable, and prohibitively costly to guess.

Before we talk about the security aspect, let's talk about the theory.

False Negatives

This change means that a "generation check false positive" is theoretically possible. A false positive is when we destroy an object and later create a new object in its place which randomly has the same generation, and a use-after-free for the old object isn't detected because the generations match.

Before we talk about how we address this, let's acknowledge that this only affects erroneous code. It doesn't mean a working program will randomly fail; it only affects the error detection mechanism.

This isn't like a screwdriver that breaks after 1000 uses, this is more like a Brita filter that warns you long before there's an actual problem.

Why They Aren't a Problem

These three aspects helps resolve any potential problems from false negatives:

Generation check failures are very loud.
Because of the sheer number of possible generations, we notice the potential problem long before it becomes an actual problem.
Generations are not observable, which makes them more secure.

Let's talk numbers!

In theory, there is currently a 1/2^64 chance, 1 in 18 billion billion, 0.000000000000000005%, that a use-after-free goes undetected.

Assuming 64-bit generations, we'll on average get a false negative after 12,786,300,000,000,000,000 safely detected use-after-frees. And each detection is very loud: it halts the program and blasts it away completely. 2 3

If restarting a program takes a second, and we try really hard, then we'll probably see our first false negative after 24 quadrillion years.

That's pretty good. I've never seen a problem in production that didn't get resolved within 24 quadrillion years.

In fact, one might even decide to go from 64-bit generations to 48 or even 32, if memory usage is more important than memory safety in a certain situation. Arm Memory Tagging is basically a 4-bit generation system, which is often enough for development and testing.

Non-observable Generations

Generations are also randomized and not observable via code:

Just like how no Java code can see an objects garbage collection metadata, there's no API or Vale code to see the generation of an object.
There's no unsafe blocks or uninitialized memory through which they can escape. 4
A generation is never exposed to the network or any files. 5

This makes it much more secure, and even if an attacker tried to brute-force it 5 billion times a second, it would still take an average of 81 years to get that first false negative.

Of course, no known security mechanisms can protect against side-channel exploits like Spectre or any un-sandboxed untrusted code running in your same process, and the same is true for generational references.

Conclusion

If it helps, one can think of a generation as like a password into a vault, and after one failed attempt, the entire vault shuts down and explodes very loudly. Our entire civilization is built upon such mechanisms, quite successfully.

If one needs theoretical perfection, then one should probably not use Vale, C, Java, Go, Swift, Rust, or any language that has any unsafe or unsound corners. One also shouldn't deploy their code on consumer computers or phones, which have probabilistic RAM failure even for working code. 6 One should use something like Ada/SPARK on specialized hardware.

I hope this helped, and feel free to come by the discord server or subreddit with any questions!

Questions and Answers

Why do we randomize the generations?

Let's say that instead of randomizing the generations, we used a monotonically-increasing thread-local nextGen integer.

An attacker could then do the below sequence of events. Imagine that they're making requests to a server API which intentionally cause the below to happen.

Create a Foo object. It's generation is 1007, nextGen is 1008.
Create a reference to it, fooRef, which contains Foo's address and generation 1007.
Repeat 2^64-1 times:

Create a Bar object, incrementing nextGen.
Destroy that Bar object.

(nextGen has wrapped around and reached 1007 again.)
Destroy the Foo object.
Create the Zork object. It's generation is 1007.

(It's arguable whether making 2^65+5 requests is possible before the heat death of the universe, but let's roll with it.)

In most allocators, when you destroy object A then create object B, object B will reuse the now-available memory from object A. That means the Zork is where the Foo once was.

Now the final step: they write to fooRef, even though the Foo has already been freed. This is bad, because the generations match but the types are different; they're now writing to Zork's memory as if it's a Foo.

Writing to the wrong type like that means the attacker can overwrite a pointer with an integer, which would indirectly allow them to manipulate any memory in the process.

This is a particularly sneaky attack because there are no loud generation failures to alert the server's team that there is a use-after-free bug here.

Luckily, randomization helps with this. If each new object's generation is random, then they wouldn't know how many times to repeat that inner loop.

Why do we halt the entire process on a generation check failure?

The main reason is to make the failure as loud as possible, so the team is aware of the use-after-free bug that someone found.

Another reason is to make it more difficult to brute-force past a generation check. Let's say that an attacker tried to brute-force a generation check by trying a dereference 5 billion times a second. It would still take an average of 81 years to get that first false negative. If restarting a halted process takes about a second, then it will take them an average of 405 billion years.

Side Notes

(interesting tangential thoughts)

Notes [–] Notes [+] 0 1 2 3 4 5 6

Thank you Blake Anderson for the question! (paraphrased)

See Generational References for more details.

An alternate language that allows the program to keep running after a check failure would be much more likely to trigger memory unsafety. Instead, Vale halts the program, which means you can't ignore it.

In the future, we could add a way to "catch" these halts and unwind, but only if we can uphold the same level of protection. A forced sleep for the thread, loud error message, and extra region-based protections are being explored.

Vale might one day have unsafe blocks, but that will only give us unsafe access to a manually managed region which doesn't have generations. We can't even use FFI to get an object's generation in Vale. There is of course the caveat that if we run malicious code in the same process, it has free reign to do anything, but that's the case with all systems, and why we would use sandboxing.

They are currently exposed via FFI, but that will soon change to an index into an internal reference table.

Your average (2012) Dell computer with 4 GB DRAM will have 3 to 240 errors per month. DRAM failure is 25k-70k per billion device hours per mbit, and 8% of DIMMs fail per year. ECC could help with this, but it's generally not offered in consumer devices. This can be mitigated in safety-critical use cases like avionics.