LanguagesArchitecture

Generational references are a new memory management technique that's easy, deterministic, and very fast.

Vale combines this technique with the upcoming regions approach, which could make Vale as fast as C++ and Rust while achieving the flexibility and developer velocity of Go and Java.

This article explains how generational references work, how they compare to reference counting, and what makes it all so fast.

Built on Single Ownership

Recall that in Vale, an object is freed when its owning reference goes out of scope. An object always has exactly one owning reference pointing to it.

We can have as many non-owning references as we want. 0

In other languages, when a programmer frees an object and then accidentally dereferences a non-owning reference to it, it can cause memory unsafety and vulnerabilities.

Our goal is to detect this situation and react to it safely, 1 in a way that doesn't incur extra complexity for the programmer.

The Generation Number

At the top of every allocation is the generation number, which changes whenever a new object is at this memory location.

One could think of it as describing "I am the nth inhabitant of this memory location". 2

Destroying an object will change the allocation's generation number.

Later on, we use this number to check if a particular object is still alive.

Side Notes
(interesting tangential thoughts)
0

This distinction is similar to C++'s unique_ptr<T> and T*.

1

Such as by halting or stack unwinding.

2

This isn't exactly true, but close enough. In reality, we randomize it every time.

Generational Reference: More than just a pointer!

Vale's references are generational references. A generational reference has three things:

  • A pointer to the object.
  • A "remembered generation" integer. 3

To create a reference to an object, we get its allocation's generation number, and include it in the reference.

3

Soon, we'll add a 16-bit "offset to generation" integer to support inline data.

Dereferencing

To dereference a generational reference, the compiler does a "liveness check" to see whether the allocation's generation number still matches our reference's target generation. 4

This prevents use-after-free problems, and helps make Vale memory safe.

It's as if the reference is saying:

"Hello! I'm looking for the 11th inhabitant of this house, are they still around?"

and the person who opens the door says:

"No, sorry, I'm the 12th inhabitant of this house, the 11th inhabitant is no more." 5

or instead:

"Yes! That is me. Which of my fields would you like to access?"

Concretely, the compiler will:

  • Add the 16-bit offset to the object pointer, and loads the object's generation from there.
  • Compares the generation to the "remembered generation" to check if it's a valid access.

If it's not a valid access, it will halt the program or unwind the stack. 6

4

This is similar to the "generational indices" technique from C++ and Rust, but applied to the entire world instead of just a specific vector.

5

This will safely halt the program, or unwind the stack.

6

To "unwind the stack" means to destroy the current function call and the caller and the caller's caller until we get to a point where the program can recover.

Speed

Generational references are only the first steps towards hybrid-generational memory, but we decided to run some early experiments to see how it compares to existing memory models.

For this experiment, we benchmarked 7 8 three flavors of Vale:

  • Unsafe, with no memory safety, the equivalent of C++ (minus caveats, see below!)
  • RC, where we use naive reference counting for all our objects.
  • GM, which uses generational references.

Mode Speed (seconds) Overhead Compared to Unsafe (seconds) Overhead Compared to Unsafe (%)
Unsafe 43.82 seconds n/a n/a
RC 54.90 seconds +11.08 seconds +25.29%
GM 48.57 seconds +4.75 seconds +10.84%

7

We used the BenchmarkRL terrain generator to gather these numbers, with different values for the --region-override flag: unsafe-fast, naive-rc, and resilient-v3 respectively.

8

Here, we benchmarked against other flavors of Vale, to isolate the differences between unsafe, reference-counting, and generational references.

Once we fully implement regions, we'll be benchmarking against C++ and Rust, stay tuned!

Generational references have only 10.84% overhead, less than half the cost of reference counting!

These are very promising results, and it also suggests that the next version of this be even better, once we've combined generational references with regions to automatically eliminate generation checks on data that's "immutably borrowed".

Try it out! In the Vale release, you can find our benchmarks here). You can find the source code for the various approaches here (feel free to swing by the discord server and we can point you to the right files).

Note this caveat! To isolate the difference between generational references and the other approaches, we allocate all non-primitive data on the heap. Future versions will add stack allocations. Once we address this limitations, we can get more precise benchmarks against the other approaches.

Why is this so fast?

Generational references are faster than reference-counted references, because:

  • Generational references have no aliasing/dealiasing overhead, just on dereference.
  • Generational references cause less cache misses.
  • Liveness checks' branching is easier to predict than RC decrements' branching.

We explain these differences more below.

No Aliasing Costs

Reference counting is costly:

  • Whenever we "alias" (make a new reference to an object), we have to dereference the object to increment its counter.
  • Whenever we "dealias" (throw away a reference), we have to:
    • Dereference the object to decrement its counter,
    • If the counter is zero, deallocate it.

For example:

vale
func launchShip(ships &Map<int, &Spaceship>, id int, armada &List<&Spaceship>) {
ship = ships.get(id);
// Increment ship's counter:
// ship.__ref_count++;

armada.add(ship);


// Decrement ship's counter:
// ship.__ref_count--;
// Deallocate ship if counter is zero:
// if (ship.__ref_count == 0) `
// ship.__deallocate();
// }
}

As you can see, reference counting incurs a cost whenever we alias or dealias. Generational references don't have that cost. The above snippet would have zero overhead if it used generational references.

Instead, generational references incur a cost whenever we dereference an object:

vale
func getShipName(ships &Map<int, &Spaceship>, id int) str {
ship = ships.get(id);

// Check if ship is still alive:
// assert(shipRef.targetGen == shipRef.obj.actualGen);
return ship.name;

}

This is cheaper because programs tend dereference less than they alias and dealias: our sample program had 4.7 million counter adjustments, but only 1.3 million liveness checks. 9 10

More Cache Friendly

Reference counting is not very "cache friendly". Adding and subtracting integers is basically free on modern CPUs compared to the real bottleneck, which is how far those integers are: if it's been recently accessed, it's in the nearby cache, and only takes a few CPU cycles to fetch. Otherwise the CPU will "cache miss" and have to bring it in all the way from RAM, which could take hundreds of cycles. 11

In our reference-counted launchShip example, the ship.__ref_count++ could take a few cycles if ship is already in the cache, or hundreds of cycles if it's not.

Generational references are more cache friendly:

  • When a generational reference goes away, we don't need to reach into memory (unlike RC, where we have to decrement a counter).
  • We don't need to increment when aliasing (see previous section); we don't need to reach into memory to increment.
9

Half of these are aliasings and half are dealiasings. Aliasing happens whenever we access a member (e.g. person.name) or make a new reference (e.g. &person). We think there are so many aliasings because many functions and collections will move data around without dereferencing it, for example when we insert and remove in hash maps.

10

Many languages are able to skip a lot of the adjustments, using static analysis. For example, Lobster can remove up to 95%. Our experiment doesn't have those optimizations; it compares naive RC to naive generational references.

Better Branch Prediction

For a given if-statement, CPUs will predict whether we'll go down the "then" branch or the "else" branch. This is called branch prediction. It guesses based on various factors.

In Vale, when we do a liveness check, we hint to the CPU that it should assume it will succeed; it doesn't have to guess.

However, in RC, when we check if a counter is zero (to know whether to free the object), we don't know what to tell the CPU to expect. It has to guess. If it's wrong, then it has to back up, throw away any effects it's done in the meantime, and go down the correct branch.

Statistical Safety

We made generational references must faster than the original implementation by using a counter-intuitive adjustment: we allow the generation to wrap.

This means we can put objects on the stack directly, which is a major speed boost.

However, it also means a programmer's use-after-free is only caught 99.9999999999996% of the time. In a world with cosmic bit-flips and RAM failure, this isn't actually a problem, and Vale is still safer than other low-level languages that have unsafe blocks or FFI. 12

12

Vale has a special technique to do FFI safely, see Fearless FFI for more.

Memory Usage

No approach gets memory safety for free 13, and the same is true here; generational references use some memory.

This technique uses 8 additional bytes for the generation numbers in allocations and in generational references.

However, we don't incur this cost as much as one might think. 14

  • Vale only uses generational references for non-owning references. The vast majority of references in a program's heap are owning references, which don't need a target generation number, and so don't require the extra 8 bytes.
  • Inline objects (ones that are embedded into the containing struct's memory) don't need a generation number, they reuse the generation of the struct that contains them.
  • Static analysis can eliminate generational references, and sometimes even the allocation's generation.
    • If it sees that a particular reference doesn't escape the lifetime of the object, it makes that reference a regular pointer; no generation number.
    • For an object, if no references escape the lifetime of the object, it also takes out the allocation's generation number.
  • Immutable data is copied by default and don't need a generation. 15

In the end, we only use generations when necessary. This is similar to other languages, such as how in C++ we occasionally use shared_ptr and weak_ptr, and in Rust we occasionally use generational_index or Rc. 16

13

Memory safety is never free, except for the most trivial of programs. Cyclone, Rust, ATS, Fortran, and every other language incurs some overhead to ensure safety. This comes in the form of branching and cache miss costs, for example in:

  • Array bounds checks.
  • Check if something's still alive, e.g. with booleans or generational indices.
  • Reference count increments/decrements.

Or large memory costs, for example if objects are stored in vectors.

14

Note that static analysis, regions, and inlining are not implemented yet; this list is talking about the final design.

15

The user can still put an immutable object onto the heap if desired, in which case it's reference counted.

16

We suspect that this approach will use slightly more memory than the average C++ program, and less than the average Rust program.

Making it Even Faster with Regions

The generational reference is only the first step towards hybrid-generational memory, and it already beats reference counting.

Hybrid-generational memory adds two layers of optimization:

  • Static analysis, to skip liveness checks.
  • Scope tethering, to keep an object alive longer.

When hybrid-generational memory is fully realized, we expect it could be as fast as Rust, and almost as fast as C++. 17

We're excited about this, because it gives us raw speed with zero unsafety, and keeps the language easy to learn and use.

See Hybrid-Generational Memory to learn more, and feel free to swing by the discord server with any questions or ideas!

17

See Hybrid-Generational Memory for some comparison with Rust!