Generational references are a memory management technique, an alternative to reference counting, tracing garbage collection, or borrow checking. 0
In this technique, every reference remembers the ID ("generation") of the object it's pointing at.
If the user dereferences an object and Vale isn't certain it's still alive, it will insert a run-time check that the reference's generation matches the object's generation.
We can safely skip these checks with static analysis, "linear style", or regions. 1
This article explains:
This is an updated version of the 2021 article, which you can still find here
In a way, it's like we're applying the generational indices technique to an entire programming language.
Here's an explanation of the most basic kind of generational references. We'll add the more complex bits further below, but this should be a useful stepping stone.
In C, whenever we allocate a struct on the heap, malloc remembers that that location is "in use". Until we free it, nobody else can allocate something there.
You could imagine that it internally has a bunch of Allocation structs, like this. 2
struct Allocation {
bool currentlyInUse;
char allocationBytes[];
};
By making a slight adjustment here, we can create a system that helps our programs' memory safety.
Let's add a
uint64_t currentGeneration; to it. 3
It starts at zero, and we increment it every time we allocate something to this spot in memory.
struct Allocation {
bool currentlyInUse;
uint64_t currentGeneration;
char allocationBytes[];
};
Now, instead of malloc returning a void*, let's make a gmalloc that returns a "generational reference".
A generational reference is simply a pointer with a "remembered generation" number next to it.
Our codebase will use gmalloc instead of malloc, and use generational references instead of raw pointers.
gmalloc increments currentGeneration every time we allocate anything, and we would also have a gfree that increments it whenever we free something.
struct GenerationalReference {
void* alloc;
uint64_t rememberedGeneration;
};
GenerationalReference gmalloc(int size) {
// Find an Allocation not currentlyInUse.
// Increment its currentGeneration.
// Return it.
}
Now, to make our program memory-safe, we always follow the rule: always call __check before dereferencing a generational reference.
Before:
struct Spaceship {
int numWings;
};
int main() {
Spaceship* ship =
(Spaceship*)malloc(sizeof(Spaceship));
// Set ship's numWings to 2
ship->numWings = 2;
}
After:
struct Spaceship {
int health;
int numWings;
};
int main() {
GenerationalReference shipRef =
gmalloc(sizeof(Spaceship));
// Set ship's numWings to 2
__check(shipRef);
((Ship*)shipRef.alloc)->numWings = 2;
}
That __check function looks at the 8 bytes above the object (at Allocation's currentGeneration), and make sure that it matches the generational reference's rememberedGeneration. It could look something like this:
void __check(GenerationalReference genRef) {
uint64_t currentGeneration = *(uint64_t*)((char*)genRef.alloc - 8);
assert(genRef.rememberedGeneration == currentGeneration);
}
It's as if we're saying:
and the person who opens the door says:
or instead:
Here's the magical part: if we always call __check before dereferencing, then our code will be memory-safe.
Some extra details:
Now, let's talk about speed!
The actual malloc uses a much faster and more interesting system than the one described here.
This is similar to the "generational indices" technique from C++ and Rust, but applied to the entire world instead of just a specific vector.
Vale does this by dereferencing a pointer that it knows points to a protected page.
The very first versions of Vale used the above approach for its memory safety. We benchmarked it, and it outperformed reference counting pretty nicely, even with regions turned off.
For this benchmark, we measured 5 6 three flavors of Vale:
Mode | Speed (seconds) | Overhead Compared to Unsafe (seconds) | Overhead Compared to Unsafe (%) |
---|---|---|---|
Unsafe | 43.82 seconds | n/a | n/a |
RC | 54.90 seconds | +11.08 seconds | +25.29% |
GR | 48.57 seconds | +4.75 seconds | +10.84% |
We used the BenchmarkRL terrain generator to gather these numbers, with different values for the --region-override flag: unsafe-fast, naive-rc, and resilient-v3 respectively.
Here, we benchmarked against other flavors of Vale, to isolate the differences between unsafe, reference-counting, and generational references.
Once we implement regions and inline data, we'll be benchmarking against C++ and Rust, stay tuned!
Generational references have only 10.84% overhead, less than half the cost of reference counting!
Note these caveats! To isolate the difference between generational references and the other approaches:
Of course, beating reference counting is a pretty low bar. 7 Vale aims to be on par with languages like C++ and Rust. How might we get there?
With three tools:
Unless you're a language like Lobster or Nim, which do some really ingenious things with reference counting.
Single ownership is a C++ concept which roughly means that "Every object is owned by exactly one other object (or local variable). When that owner is destroyed, the owned object is automatically freed."
Vale uses single ownership; every Vale object is owned by a single struct, array, or stack frame. 8
In Vale, any object on the stack is owned by the containing stack frame.
When a function accesses data owned by one of its variables, it doesn't need a __check.
struct Spaceship { fuel int; }
exported func main() {
ship Spaceship = Spaceship(7);
// Doesn't need a __check!
println(ship_ref.fuel);
}
Any struct on the heap is owned by an "owning pointer", such as the one in this local variable ship.
Same as before, if a function accesses data owned by one of its variables, it doesn't need a __check.
struct Ship { fuel int; }
exported func main() {
ship Ship = Ship(7);
}
If you want to access something you don't own, you need a reference to it, written &Ship. Vale's references are all generational references.
Vale automatically inserts __checks when you access anything you don't own.
(...at least, until the next section when we talk about regions.)
func PrintFuel(ship_ref &Ship) {
// implicit __check(ship_ref)
println(ship_ref.fuel);
}
Even primitives like int have a single owner, though it doesn't feel like it because they're automatically copied.
We can specifically move ownership to avoid __checks.
Here's the above PrintFuel function.
func PrintFuel(ship_ref &Ship) {
// implicit __check(ship_ref)
println(ship_ref.fuel);
}
exported func main() {
ship = Ship(7);
PrintFuel(&ship);
PrintFuel(&ship);
}
...and here we instead temporarily move ownership to the PrintFuel function.
func PrintFuel(ship Ship) Ship {
// No __check needed!
println(ship.fuel);
return ship;
}
exported func main() {
ship = Ship(7);
set ship = PrintFuel(ship); 9
set ship = PrintFuel(ship);
}
We sometimes call this "linear style", and it allows us to skip __checks whenever we want.
If you want to learn more, check out any of these articles:
We can write an entire program like this and reduce our __checks down to zero. However, if zero __checks is the goal, it's probably easier to blend linear style and regions, which we'll talk about next.
In future versions of Vale, we might even have a borrow statement for this.
We can also use regions' to safely skip __checks for a certain scope.
We can add pure to the above PrintFuel function to inform the compiler that we won't change any data that existed before the function call.
In exchange for that, the compiler is able to skip __checks for any arguments, or any data those arguments might own.
pure func PrintFuel(ship_ref &Ship) {
// No __check needed!
println(ship_ref.fuel);
}
This is called immutable region borrowing. It's a pretty in-depth topic, check out this series which talks about other ways that regions help eliminate __checks.
We recently finished the first regions prototype, and it showed that our benchmark program had no observable overhead when using linear style and regions to eliminate all __checks. Check out Vale's First Prototype for Immutable Region Borrowing for more!
Using this blend of linear style and regions will feel familiar to those who have used languages like Rust. It has the same benefit of avoiding run-time costs (from garbage collection, reference counting, or __checks), though it can often add other run-time costs in the form of extra cloning, bounds checking, and hashing. 10 11
In both paradigms, we often work around the lack of mutable aliasing by cloning or by making all aliases into indices/IDs into a central vectors or hash maps.
See also Chasing the Myth of Zero-Overhead Memory Safety, which talks about how there's no way to eliminate all run-time memory-safety-related overhead.
Vale programs will likely want to start with generational references, as they're the easiest and most flexible way to program. Generational references enable mutable aliasing which enables much better decoupling and healthier abstractions. 12
Even if the programmer does nothing, the compiler will still skip a vast amount of __checks, since libraries (especially the standard library) make heavy use of regions and linear style.
As the programmer gets a better sense of where they need to optimize (by profiling), they'll then want to add regions and linear style to the specific areas that call for it. Sometimes, simply adding pure can eliminate a large amount of __checks in a function.
It can sometimes be a fun challenge to reduce a program's __checks to zero. However, like we often see in Rust, when we eliminate certain kinds of run-time overhead we often end up adding other kinds of overhead such as cloning, bounds checking, or hashing. There's a certain point of diminishing returns, an optimal balance between generational references, linear style, and regions.
This is good, because it's fast, yet still gets the flexibility offered by generational references.
This is Vale's strength: it's fast by default, and we can gradually optimize where it makes sense to.
More concretely, without mutable aliasing, we can't use patterns like observers, back-references, dependency references, callbacks, and delegates, all which enable better decoupling between parts of our programs.
The main benefit of single ownership is that it allows us to embed a struct inside another struct, or a struct inside an array. 13 How do we do that with generational references?
Recall our __check function from before, and notice the hardcoded 8:
void __check(GenerationalReference genRef) {
uint64_t currentGeneration = *(uint64_t*)((char*)genRef.allocation - 8);
assert(genRef.rememberedGeneration == currentGeneration);
}
Instead of that 8, we're going to subtract a tiny offset-to-generation number that we put in the top byte of the genRef.allocation pointer. 14
This will point us to the generation of the containing struct.
With this, our __check function becomes this:
void __check(GenerationalReference genRef) {
uint8_t offsetToGen = genRef.allocation >> 56;
uint64_t currentGeneration = *(uint64_t*)((char*)genRef.allocation - offsetToGen);
assert(genRef.rememberedGeneration == currentGeneration);
}
Our __check function just went from 4 instructions to 5 (or 6 for certain CPUs 15). This isn't that much, especially considering we eliminate most generation checks with linear style and regions. For that low price, we can now embed structs in other structs.
This gives us more control over the layout of our data, which helps make our programs more cache-friendly.
Technically, we don't need single ownership for this. Languages like C# allow us to put value types (such as structs) directly inside classes and arrays. The real trick is whether we can take a reference to that embedded data.
Note that the offset-to-generation number can only fit in a single byte, which means that if a struct is more than 256 bytes wide, it will need another generation number inside it.
It's 5 instructions assuming we have top-byte ignore or the upcoming linear address masking. If we don't it'll be 6 instructions as we add a bitwise-and to mask off those top 8 bits before we subtract.
When the compiler is certain data won't change (such as when we're using immutable region borrowing), we can "pre-check" our generational references once beforehand to skip a lot of checks later.
Recall the pure PrintFuel function:
pure func PrintFuel(ship_ref &Ship) {
// No __checks needed!
foreach _ in 0..100 {
println(ship_ref.fuel);
println(ship_ref.fuel);
}
}
And now imagine we pass a generational reference into PrintFuel, like here:
func OtherFunc(ship_gen_ref &Ship) {
PrintFuel(&ship_gen_ref);
}
OtherFunc is actually doing a __precheck on that ship_gen_ref. A __precheck won't assert, but will instead produce a specific address if the check fails, one that will predictably produce a fault when accessed.
Here's a sample implementation of __precheck, which returns address 0 if the pre-check fails: 16
void* __precheck(GenerationalReference genRef) {
uint64_t currentGeneration = *(uint64_t*)((char*)genRef.allocation - 8);
return (genRef.rememberedGeneration == currentGeneration) * genRef.allocation;
}
Later, if the user tries to access a reference that failed its __precheck, the program will safely raise a fault, similar to when a normal __check fails.
In the example above, we turned 200 __checks into just 1 __precheck.
That's the power of region borrowing! 17
Note that this doesn't mean Vale has a concept of null; there's no such thing as null in Vale. This is a backend implementation detail.
On webassembly and specialized processors where 0 is a valid address, we would use e.g. INTPTR_MAX instead. On even more specialized processors that don't fault when accessing invalid addresses, we wouldn't be able to use pre-checking.
Fun fact: region borrowing can also be applied to a language that uses reference counting, to eliminate the vast majority of increments and decrements. As far as I know, nobody has attempted this yet, but I think this could single-handedly make RC competitive with GC again.
This is where the scheme gets interesting, involving security, statistics, sorcery, and pragmatism. (Special thanks to hc for this next key insight!)
Instead of remembering and incrementing an allocation's generation, we can just use a pseudo-random number instead.
Whenever we put a struct on the stack, we just conjure up a pseudo-random number and put it above the struct. 18 19
And whenever the struct disappears, we overwrite the generation with a zero, so that later __checks on it will fail.
This also has other benefits:
This is a very strong stochastic approach, similar to how passwords work.
It does have a theoretical downside. For example, if we have an invalid access in our server code that's causing (worst case) six million loud errors a second and we decide to ignore it, then after 73,250 years 23 on average it could reuse the same generation as something that was there before, in which case the invalid access bug could cause some unsafety.
Those well-versed in statistics will recognize that this isn't really a problem, but let's explore it a little.
The odds of an invalid access happening undetected is always 1/2^64 for a 64-bit generation.
It also helps to keep in mind that these probabilities only apply to the error detection mechanism, not to the program itself.
One of my beta readers asked:
That would be true, except that most Rust programs use unsafe or have it in their dependencies, even when you don't count the standard library.
When someone uses unsafe to get around the borrow checker's restrictions, even if they think really hard about unsafe's more arcane interactions, bugs can remain hidden for a long time, stealthily causing undefined behavior in the unsafe block and in the safe code around it.
We'd similarly use random generational references to get around the restrictions of linear style, and fortunately, any invalid accesses are detected very loudly as a check failure brings down the entire program. Bugs are discovered very quickly, instead of causing mysterious behavior for years.
Vale's design also goes one step further and replaces unsafe with a "skip-check dereference" operator to skip __checks in release mode. The major benefit is that the compiler ignores these in dependencies by default, so we no longer have to trust that our dependencies used unsafety well.
So it's better in some ways, worse in others. It's just a different approach.
If you've ever heard of Arm's memory tagging, this is like that but with a wider tag.
This pseudo-random number generator doesn't have to be that sophisticated. The most basic implementation could be a simple global integer that we increment on every use, or we could do more interesting things like periodically randomize it, shuffle it, or add prime numbers to it.
When destroying an object, overwriting a generation with zero is faster than incrementing what was there before. Also, we don't have to check for overflow when a generation number hits 2^64.
This relies on the OS detecting accesses to released memory and raising segmentation faults.
We can still release memory to the OS, but we remap the virtual address space to a single shared page, preferably read-only and filled with zeroes. This lets us not trigger a fault on __prechecks.
Given a 64 bit generation, it will take an average of 13 quintillion tries to trigger a false negative.
If there's only one failure per second, it's 439 billion years on average to cause unsafety.
If there's only one failure per week, it would take 266 quadrillion years on average to cause unsafety.
If there's six million check failures per second (the largest DDoS in history), it's 73,250 years on average to cause any unsafety.
Comfortable odds, I'd say!
Generational references and regions also enable some pretty amazing new features:
There are also a few more features that regions will enable, involving concurrency, transactions, and unique security benefits. They're a little too early in the design phases to explain here, but stay tuned!
Generational references and regions are a pretty promising combination with a lot of benefits:
There are some downsides, as with any memory safety paradigm:
But overall, the technique looks very promising, and feels pretty good to use.
I hope you enjoyed this article! As always, if you have any questions, feel free to ask on twitter, our discord server, or the subreddit. Cheers!
- Evan Ovadia
At any given time in a program, the vast majority of pointers are owning pointers, which require no generation.
The random generational references approach above is Vale's default, but we'll also have this "generation tables" approach as another option for users, in case random generational references don't work for them for some reason. 25
In this approach, an object doesn't have a 64 bit generation, it instead has a 32-bit entryIndex into the "generation table".
A generational reference contains three things:
Every region will have a generation table, a single struct which contains:
An entry has two things:
To do a generation check for a given generational reference, we look up the indexth entry in the table, and see if its currentGeneration matches the reference's rememberedGeneration.
When the user creates a Vale object, we:
When the user destroys a Vale object, we:
One reason we're keeping the generation tables approach around is as an opt-in Spectre mitigation, similar to Go's -spectre flag.
Spectre attacks aren't any more likely in Vale and still require very specific conditions (see Golang's explanation), but if there's a Spectre attack plus an extant use-after-free plus attacker-controlled input, they could possibly follow up their Spectre attack with an RCE.
It's unknown if this is actually possible, but until we can rule it out, we'll work towards having this alternate option.