Generational references are a new memory management technique that's easy, deterministic, and very fast.
Vale combines this technique with the upcoming regions approach, which could make Vale as fast as C++ and Rust while achieving the flexibility and developer velocity of Go and Java.
This article explains how generational references work, how they compare to reference counting, and what makes it all so fast.
Recall that in Vale, an object is freed when its owning reference goes out of scope. An object always has exactly one owning reference pointing to it.
We can have as many non-owning references as we want. 0
In other languages, when a programmer frees an object and then accidentally dereferences a non-owning reference to it, it can cause memory unsafety and vulnerabilities.
Our goal is to detect this situation and react to it safely, 1 in a way that doesn't incur extra complexity for the programmer.
At the top of every allocation is the generation number, which changes whenever a new object is at this memory location.
One could think of it as describing "I am the nth inhabitant of this memory location". 2
Destroying an object will change the allocation's generation number.
Later on, we use this number to check if a particular object is still alive.
This distinction is similar to C++'s unique_ptr<T> and T*.
Such as by halting or stack unwinding.
This isn't exactly true, but close enough. In reality, we randomize it every time.
Vale's references are generational references. A generational reference has three things:
To create a reference to an object, we get its allocation's generation number, and include it in the reference.
Soon, we'll add a 16-bit "offset to generation" integer to support inline data.
To dereference a generational reference, the compiler does a "liveness check" to see whether the allocation's generation number still matches our reference's target generation. 4
This prevents use-after-free problems, and helps make Vale memory safe.
It's as if the reference is saying:
and the person who opens the door says:
or instead:
Concretely, the compiler will:
If it's not a valid access, it will halt the program or unwind the stack. 6
This is similar to the "generational indices" technique from C++ and Rust, but applied to the entire world instead of just a specific vector.
This will safely halt the program, or unwind the stack.
To "unwind the stack" means to destroy the current function call and the caller and the caller's caller until we get to a point where the program can recover.
Generational references are only the first steps towards hybrid-generational memory, but we decided to run some early experiments to see how it compares to existing memory models.
For this experiment, we benchmarked 7 8 three flavors of Vale:
Mode | Speed (seconds) | Overhead Compared to Unsafe (seconds) | Overhead Compared to Unsafe (%) |
---|---|---|---|
Unsafe | 43.82 seconds | n/a | n/a |
RC | 54.90 seconds | +11.08 seconds | +25.29% |
GM | 48.57 seconds | +4.75 seconds | +10.84% |
We used the BenchmarkRL terrain generator to gather these numbers, with different values for the --region-override flag: unsafe-fast, naive-rc, and resilient-v3 respectively.
Here, we benchmarked against other flavors of Vale, to isolate the differences between unsafe, reference-counting, and generational references.
Once we fully implement regions, we'll be benchmarking against C++ and Rust, stay tuned!
Generational references have only 10.84% overhead, less than half the cost of reference counting!
These are very promising results, and it also suggests that the next version of this be even better, once we've combined generational references with regions to automatically eliminate generation checks on data that's "immutably borrowed".
Try it out! In the Vale release, you can find our benchmarks here). You can find the source code for the various approaches here (feel free to swing by the discord server and we can point you to the right files).
Note this caveat! To isolate the difference between generational references and the other approaches, we allocate all non-primitive data on the heap. Future versions will add stack allocations. Once we address this limitations, we can get more precise benchmarks against the other approaches.
Generational references are faster than reference-counted references, because:
We explain these differences more below.
Reference counting is costly:
For example:
As you can see, reference counting incurs a cost whenever we alias or dealias. Generational references don't have that cost. The above snippet would have zero overhead if it used generational references.
Instead, generational references incur a cost whenever we dereference an object:
This is cheaper because programs tend dereference less than they alias and dealias: our sample program had 4.7 million counter adjustments, but only 1.3 million liveness checks. 9 10
Reference counting is not very "cache friendly". Adding and subtracting integers is basically free on modern CPUs compared to the real bottleneck, which is how far those integers are: if it's been recently accessed, it's in the nearby cache, and only takes a few CPU cycles to fetch. Otherwise the CPU will "cache miss" and have to bring it in all the way from RAM, which could take hundreds of cycles. 11
In our reference-counted launchShip example, the ship.__ref_count++ could take a few cycles if ship is already in the cache, or hundreds of cycles if it's not.
Generational references are more cache friendly:
Half of these are aliasings and half are dealiasings. Aliasing happens whenever we access a member (e.g. person.name) or make a new reference (e.g. &person). We think there are so many aliasings because many functions and collections will move data around without dereferencing it, for example when we insert and remove in hash maps.
Many languages are able to skip a lot of the adjustments, using static analysis. For example, Lobster can remove up to 95%. Our experiment doesn't have those optimizations; it compares naive RC to naive generational references.
For a given if-statement, CPUs will predict whether we'll go down the "then" branch or the "else" branch. This is called branch prediction. It guesses based on various factors.
In Vale, when we do a liveness check, we hint to the CPU that it should assume it will succeed; it doesn't have to guess.
However, in RC, when we check if a counter is zero (to know whether to free the object), we don't know what to tell the CPU to expect. It has to guess. If it's wrong, then it has to back up, throw away any effects it's done in the meantime, and go down the correct branch.
We made generational references must faster than the original implementation by using a counter-intuitive adjustment: we allow the generation to wrap.
This means we can put objects on the stack directly, which is a major speed boost.
However, it also means a programmer's use-after-free is only caught 99.9999999999996% of the time. In a world with cosmic bit-flips and RAM failure, this isn't actually a problem, and Vale is still safer than other low-level languages that have unsafe blocks or FFI. 12
Vale has a special technique to do FFI safely, see Fearless FFI for more.
No approach gets memory safety for free 13, and the same is true here; generational references use some memory.
This technique uses 8 additional bytes for the generation numbers in allocations and in generational references.
However, we don't incur this cost as much as one might think. 14
In the end, we only use generations when necessary. This is similar to other languages, such as how in C++ we occasionally use shared_ptr and weak_ptr, and in Rust we occasionally use generational_index or Rc. 16
Memory safety is never free, except for the most trivial of programs. Cyclone, Rust, ATS, Fortran, and every other language incurs some overhead to ensure safety. This comes in the form of branching and cache miss costs, for example in:
Or large memory costs, for example if objects are stored in vectors.
Note that static analysis, regions, and inlining are not implemented yet; this list is talking about the final design.
The user can still put an immutable object onto the heap if desired, in which case it's reference counted.
We suspect that this approach will use slightly more memory than the average C++ program, and less than the average Rust program.
The generational reference is only the first step towards hybrid-generational memory, and it already beats reference counting.
Hybrid-generational memory adds two layers of optimization:
When hybrid-generational memory is fully realized, we expect it could be as fast as Rust, and almost as fast as C++. 17
We're excited about this, because it gives us raw speed with zero unsafety, and keeps the language easy to learn and use.
See Hybrid-Generational Memory to learn more, and feel free to swing by the discord server with any questions or ideas!
See Hybrid-Generational Memory for some comparison with Rust!