LanguagesArchitecture

Generational references are a memory management technique, an alternative to reference counting, tracing garbage collection, or borrow checking. 0

In this technique, every reference remembers the ID ("generation") of the object it's pointing at.

If the user dereferences an object and Vale isn't certain it's still alive, it will insert a run-time check that the reference's generation matches the object's generation.

We can safely skip these checks with static analysis, "linear style", or regions. 1

This article explains:

  • What a generation is.
  • How generational references work.
  • How linear style, static analysis, and regions make them faster.
  • How they all work together to form Vale's memory safety approach.
Side Notes
(interesting tangential thoughts)
0

This is an updated version of the 2021 article, which you can still find here

1

In a way, it's like we're applying the generational indices technique to an entire programming language.

Generational References

Here's an explanation of the most basic kind of generational references. We'll add the more complex bits further below, but this should be a useful stepping stone.

In C, whenever we allocate a struct on the heap, malloc remembers that that location is "in use". Until we free it, nobody else can allocate something there.

You could imagine that it internally has a bunch of Allocation structs, like this. 2

c
struct Allocation {
  bool currentlyInUse;
  char allocationBytes[];
};

By making a slight adjustment here, we can create a system that helps our programs' memory safety.

Let's add a
uint64_t currentGeneration; to it. 3

It starts at zero, and we increment it every time we allocate something to this spot in memory.

c
struct Allocation {
  bool currentlyInUse;
  uint64_t currentGeneration;
  char allocationBytes[];
};

Now, instead of malloc returning a void*, let's make a gmalloc that returns a "generational reference".

A generational reference is simply a pointer with a "remembered generation" number next to it.

Our codebase will use gmalloc instead of malloc, and use generational references instead of raw pointers.

gmalloc increments currentGeneration every time we allocate anything, and we would also have a gfree that increments it whenever we free something.

c
struct GenerationalReference {
  void* alloc;
  uint64_t rememberedGeneration;
};

GenerationalReference gmalloc(int size) {
  // Find an Allocation not currentlyInUse.
  // Increment its currentGeneration.
  // Return it.
}

Now, to make our program memory-safe, we always follow the rule: always call __check before dereferencing a generational reference.

Before:

c
struct Spaceship {
  int numWings;
};

int main() {
  Spaceship* ship =
    (Spaceship*)malloc(sizeof(Spaceship));

  // Set ship's numWings to 2
  ship->numWings = 2;
}

After:

c
struct Spaceship {
  int health;
  int numWings;
};

int main() {
  GenerationalReference shipRef =
    gmalloc(sizeof(Spaceship));

  // Set ship's numWings to 2
  __check(shipRef);
  ((Ship*)shipRef.alloc)->numWings = 2;
}

That __check function looks at the 8 bytes above the object (at Allocation's currentGeneration), and make sure that it matches the generational reference's rememberedGeneration. It could look something like this:

c
void __check(GenerationalReference genRef) {
  uint64_t currentGeneration = *(uint64_t*)((char*)genRef.alloc - 8);
  assert(genRef.rememberedGeneration == currentGeneration);
}

It's as if we're saying:

"Hello! I'm looking for the 11th inhabitant of this house, are they still around?"

and the person who opens the door says:

"No, sorry, I'm the 12th inhabitant of this house, the 11th inhabitant is no more."

or instead:

"Yes! That is me. Feel free to dereference!"

Here's the magical part: if we always call __check before dereferencing, then our code will be memory-safe.

Some extra details:

  • All these structs are on the heap. Further below we'll talk about how we can put data on the stack.
  • When a generation hits INT_MAX, we retire that spot and never allocate to it again.
  • The above __check uses assert, but we actually just manually raise a segmentation fault because it's faster. 4

Now, let's talk about speed!

2

The actual malloc uses a much faster and more interesting system than the one described here.

3

This is similar to the "generational indices" technique from C++ and Rust, but applied to the entire world instead of just a specific vector.

4

Vale does this by dereferencing a pointer that it knows points to a protected page.

Generational References are Fast

The very first versions of Vale used the above approach for its memory safety. We benchmarked it, and it outperformed reference counting pretty nicely, even with regions turned off.

For this benchmark, we measured 5 6 three flavors of Vale:

  • Unsafe, with no memory safety, the equivalent of C++ (minus caveats, see below!)
  • RC, where we use naive reference counting for all our objects.
  • GR, which uses generational references and no regions.

Mode Speed (seconds) Overhead Compared to Unsafe (seconds) Overhead Compared to Unsafe (%)
Unsafe 43.82 seconds n/a n/a
RC 54.90 seconds +11.08 seconds +25.29%
GR 48.57 seconds +4.75 seconds +10.84%

5

We used the BenchmarkRL terrain generator to gather these numbers, with different values for the --region-override flag: unsafe-fast, naive-rc, and resilient-v3 respectively.

6

Here, we benchmarked against other flavors of Vale, to isolate the differences between unsafe, reference-counting, and generational references.

Once we implement regions and inline data, we'll be benchmarking against C++ and Rust, stay tuned!

Generational references have only 10.84% overhead, less than half the cost of reference counting!

Note these caveats! To isolate the difference between generational references and the other approaches:

  • This is only the basic generational references approach, Vale has since improved on the design, explained further below.
  • This is run without regions or linear style, just basic generational references.
  • In all flavors, we only allocate objects on the heap, except for primitives. Future versions will add stack allocations.
  • We used a gmalloc implementation for all versions, though only GR ever touches the generation number, the other versions ignore it.

Of course, beating reference counting is a pretty low bar. 7 Vale aims to be on par with languages like C++ and Rust. How might we get there?

With three tools:

  • Single ownership: the compiler eliminates __checks for all owned values and owning references, and we can take advantage of that with a "linear style" of coding.
  • Regions, specifically immutable region borrowing can temporarily consider data immutable to skip __checks.
  • Static analysis, where the compiler automatically skips redundant __checks.
7

Unless you're a language like Lobster or Nim, which do some really ingenious things with reference counting.

Skipping checks with single ownership

Single ownership is a C++ concept which roughly means that "Every object is owned by exactly one other object (or local variable). When that owner is destroyed, the owned object is automatically freed."

Vale uses single ownership; every Vale object is owned by a single struct, array, or stack frame. 8

In Vale, any object on the stack is owned by the containing stack frame.

When a function accesses data owned by one of its variables, it doesn't need a __check.

vale
struct Spaceship { fuel int; }
exported func main() {
ship Spaceship = Spaceship(7);

// Doesn't need a __check!
println(ship_ref.fuel);

}

Any struct on the heap is owned by an "owning pointer", such as the one in this local variable ship.

Same as before, if a function accesses data owned by one of its variables, it doesn't need a __check.

vale
struct Ship { fuel int; }
exported func main() {
ship Ship = Ship(7);
}

If you want to access something you don't own, you need a reference to it, written &Ship. Vale's references are all generational references.

Vale automatically inserts __checks when you access anything you don't own.

(...at least, until the next section when we talk about regions.)

vale
func PrintFuel(ship_ref &Ship) {
// implicit __check(ship_ref)
println(ship_ref.fuel);
}

8

Even primitives like int have a single owner, though it doesn't feel like it because they're automatically copied.

Linear Style: Moving data to skip checks

We can specifically move ownership to avoid __checks.

Here's the above PrintFuel function.

vale
func PrintFuel(ship_ref &Ship) {
// implicit __check(ship_ref)
println(ship_ref.fuel);
}


exported func main() {
ship = Ship(7);
PrintFuel(&ship);
PrintFuel(&ship);

}

...and here we instead temporarily move ownership to the PrintFuel function.

vale
func PrintFuel(ship Ship) Ship {
// No __check needed!
println(ship.fuel);
return ship;

}


exported func main() {
ship = Ship(7);
set ship = PrintFuel(ship); 9
set ship = PrintFuel(ship);

}

We sometimes call this "linear style", and it allows us to skip __checks whenever we want.

If you want to learn more, check out any of these articles:

We can write an entire program like this and reduce our __checks down to zero. However, if zero __checks is the goal, it's probably easier to blend linear style and regions, which we'll talk about next.

9

In future versions of Vale, we might even have a borrow statement for this.

Skipping checks with regions

We can also use regions' to safely skip __checks for a certain scope.

We can add pure to the above PrintFuel function to inform the compiler that we won't change any data that existed before the function call.

In exchange for that, the compiler is able to skip __checks for any arguments, or any data those arguments might own.

vale
pure func PrintFuel(ship_ref &Ship) {
// No __check needed!
println(ship_ref.fuel);
}

This is called immutable region borrowing. It's a pretty in-depth topic, check out this series which talks about other ways that regions help eliminate __checks.

We recently finished the first regions prototype, and it showed that our benchmark program had no observable overhead when using linear style and regions to eliminate all __checks. Check out Vale's First Prototype for Immutable Region Borrowing for more!

Using this blend of linear style and regions will feel familiar to those who have used languages like Rust. It has the same benefit of avoiding run-time costs (from garbage collection, reference counting, or __checks), though it can often add other run-time costs in the form of extra cloning, bounds checking, and hashing. 10 11

10

In both paradigms, we often work around the lack of mutable aliasing by cloning or by making all aliases into indices/IDs into a central vectors or hash maps.

11

See also Chasing the Myth of Zero-Overhead Memory Safety, which talks about how there's no way to eliminate all run-time memory-safety-related overhead.

Blending linear style, regions, and generational references

Vale programs will likely want to start with generational references, as they're the easiest and most flexible way to program. Generational references enable mutable aliasing which enables much better decoupling and healthier abstractions. 12

Even if the programmer does nothing, the compiler will still skip a vast amount of __checks, since libraries (especially the standard library) make heavy use of regions and linear style.

As the programmer gets a better sense of where they need to optimize (by profiling), they'll then want to add regions and linear style to the specific areas that call for it. Sometimes, simply adding pure can eliminate a large amount of __checks in a function.

It can sometimes be a fun challenge to reduce a program's __checks to zero. However, like we often see in Rust, when we eliminate certain kinds of run-time overhead we often end up adding other kinds of overhead such as cloning, bounds checking, or hashing. There's a certain point of diminishing returns, an optimal balance between generational references, linear style, and regions.

This is good, because it's fast, yet still gets the flexibility offered by generational references.

This is Vale's strength: it's fast by default, and we can gradually optimize where it makes sense to.

12

More concretely, without mutable aliasing, we can't use patterns like observers, back-references, dependency references, callbacks, and delegates, all which enable better decoupling between parts of our programs.

How do we put data in other objects?

This is only hypothetical, as Vale doesn't yet support inline data (that's next after regions). This section describes how we plan on doing it.

The main benefit of single ownership is that it allows us to embed a struct inside another struct, or a struct inside an array. 13 How do we do that with generational references?

Recall our __check function from before, and notice the hardcoded 8:

c
void __check(GenerationalReference genRef) {
  uint64_t currentGeneration = *(uint64_t*)((char*)genRef.allocation - 8);
  assert(genRef.rememberedGeneration == currentGeneration);
}

Instead of that 8, we're going to subtract a tiny offset-to-generation number that we put in the top byte of the genRef.allocation pointer. 14

This will point us to the generation of the containing struct.

With this, our __check function becomes this:

c
void __check(GenerationalReference genRef) {
  uint8_t offsetToGen = genRef.allocation >> 56;
  uint64_t currentGeneration = *(uint64_t*)((char*)genRef.allocation - offsetToGen);
  assert(genRef.rememberedGeneration == currentGeneration);
}

Our __check function just went from 4 instructions to 5 (or 6 for certain CPUs 15). This isn't that much, especially considering we eliminate most generation checks with linear style and regions. For that low price, we can now embed structs in other structs.

This gives us more control over the layout of our data, which helps make our programs more cache-friendly.

13

Technically, we don't need single ownership for this. Languages like C# allow us to put value types (such as structs) directly inside classes and arrays. The real trick is whether we can take a reference to that embedded data.

14

Note that the offset-to-generation number can only fit in a single byte, which means that if a struct is more than 256 bytes wide, it will need another generation number inside it.

15

It's 5 instructions assuming we have top-byte ignore or the upcoming linear address masking. If we don't it'll be 6 instructions as we add a bitwise-and to mask off those top 8 bits before we subtract.

Pre-checking

When the compiler is certain data won't change (such as when we're using immutable region borrowing), we can "pre-check" our generational references once beforehand to skip a lot of checks later.

Recall the pure PrintFuel function:

vale
pure func PrintFuel(ship_ref &Ship) {
// No __checks needed!
foreach _ in 0..100 {
println(ship_ref.fuel);
println(ship_ref.fuel);

}

}

And now imagine we pass a generational reference into PrintFuel, like here:

vale
func OtherFunc(ship_gen_ref &Ship) {
PrintFuel(&ship_gen_ref);
}

OtherFunc is actually doing a __precheck on that ship_gen_ref. A __precheck won't assert, but will instead produce a specific address if the check fails, one that will predictably produce a fault when accessed.

Here's a sample implementation of __precheck, which returns address 0 if the pre-check fails: 16

c
void* __precheck(GenerationalReference genRef) {
  uint64_t currentGeneration = *(uint64_t*)((char*)genRef.allocation - 8);
  return (genRef.rememberedGeneration == currentGeneration) * genRef.allocation;
}

Later, if the user tries to access a reference that failed its __precheck, the program will safely raise a fault, similar to when a normal __check fails.

In the example above, we turned 200 __checks into just 1 __precheck.

That's the power of region borrowing! 17

16

Note that this doesn't mean Vale has a concept of null; there's no such thing as null in Vale. This is a backend implementation detail.

On webassembly and specialized processors where 0 is a valid address, we would use e.g. INTPTR_MAX instead. On even more specialized processors that don't fault when accessing invalid addresses, we wouldn't be able to use pre-checking.

17

Fun fact: region borrowing can also be applied to a language that uses reference counting, to eliminate the vast majority of increments and decrements. As far as I know, nobody has attempted this yet, but I think this could single-handedly make RC competitive with GC again.

Random Generational References

This is where the scheme gets interesting, involving security, statistics, sorcery, and pragmatism. (Special thanks to hc for this next key insight!)

Instead of remembering and incrementing an allocation's generation, we can just use a pseudo-random number instead.

Whenever we put a struct on the stack, we just conjure up a pseudo-random number and put it above the struct. 18 19

And whenever the struct disappears, we overwrite the generation with a zero, so that later __checks on it will fail.

This also has other benefits:

  • The language has to do less work! 20
  • The object can live on the stack, inside other objects, directly inside arrays, or even inside custom allocators.
  • We can use the regular malloc instead of the gmalloc we invented above.
  • We can reuse a specific spot in memory as many times as we want.
  • We're able to release memory back to the OS, in environments with virtual memory. 21 22

This is a very strong stochastic approach, similar to how passwords work.

It does have a theoretical downside. For example, if we have an invalid access in our server code that's causing (worst case) six million loud errors a second and we decide to ignore it, then after 73,250 years 23 on average it could reuse the same generation as something that was there before, in which case the invalid access bug could cause some unsafety.

Those well-versed in statistics will recognize that this isn't really a problem, but let's explore it a little.

The odds of an invalid access happening undetected is always 1/2^64 for a 64-bit generation.

  • The odds don't change with the number of live objects.
  • The odds don't change with the how long the program's been running.
  • The odds don't change with the the number of objects that have lived in that particular location.
  • The odds don't change with the the number of previous generation check failures, because the first one brings down the entire process.

It also helps to keep in mind that these probabilities only apply to the error detection mechanism, not to the program itself.

One of my beta readers asked:

"How does this compare to Rust? It seems like the probabilistic detection wouldn't be as good."

That would be true, except that most Rust programs use unsafe or have it in their dependencies, even when you don't count the standard library.

When someone uses unsafe to get around the borrow checker's restrictions, even if they think really hard about unsafe's more arcane interactions, bugs can remain hidden for a long time, stealthily causing undefined behavior in the unsafe block and in the safe code around it.

We'd similarly use random generational references to get around the restrictions of linear style, and fortunately, any invalid accesses are detected very loudly as a check failure brings down the entire program. Bugs are discovered very quickly, instead of causing mysterious behavior for years.

Vale's design also goes one step further and replaces unsafe with a "skip-check dereference" operator to skip __checks in release mode. The major benefit is that the compiler ignores these in dependencies by default, so we no longer have to trust that our dependencies used unsafety well.

So it's better in some ways, worse in others. It's just a different approach.

18

If you've ever heard of Arm's memory tagging, this is like that but with a wider tag.

19

This pseudo-random number generator doesn't have to be that sophisticated. The most basic implementation could be a simple global integer that we increment on every use, or we could do more interesting things like periodically randomize it, shuffle it, or add prime numbers to it.

20

When destroying an object, overwriting a generation with zero is faster than incrementing what was there before. Also, we don't have to check for overflow when a generation number hits 2^64.

21

This relies on the OS detecting accesses to released memory and raising segmentation faults.

22

We can still release memory to the OS, but we remap the virtual address space to a single shared page, preferably read-only and filled with zeroes. This lets us not trigger a fault on __prechecks.

23

Given a 64 bit generation, it will take an average of 13 quintillion tries to trigger a false negative.

If there's only one failure per second, it's 439 billion years on average to cause unsafety.

If there's only one failure per week, it would take 266 quadrillion years on average to cause unsafety.

If there's six million check failures per second (the largest DDoS in history), it's 73,250 years on average to cause any unsafety.

Comfortable odds, I'd say!

Hidden Benefits

Generational references and regions also enable some pretty amazing new features:

  • Seamless Concurrency: Since regions don't require any data coloring (like Sync/Send) or function coloring (like async or ?async) we can use structured concurrency without refactoring the surrounding program.
  • Perfect replayability: Since generational references don't require nondeterministic collection, Vale was designed to replay a past run of the program to perfectly reproduce any error.
  • Higher RAII: Vale's type system can use ownership to ensure that we don't forget to call a function at some point in the future, and its destructors can take in arguments and return values.

There are also a few more features that regions will enable, involving concurrency, transactions, and unique security benefits. They're a little too early in the design phases to explain here, but stay tuned!

Summary

Generational references and regions are a pretty promising combination with a lot of benefits:

  • Easy to use, especially for anyone coming from more complex languages like C++ or Rust.
  • They let us choose which parts of our code should be simple, and which should be optimized further.
  • They enable a lot of interesting new features, like Seamless Concurrency, Perfect replayability, and Higher RAII.

There are some downsides, as with any memory safety paradigm:

  • It occasionally adds an 8-byte generation number to the top of allocations.
  • Most pointers will only be 8 bytes, 24 but some will have an additional 8-byte generation with them.
  • There will be __checks at run-time, unless the programmer chooses to optimize them away.

But overall, the technique looks very promising, and feels pretty good to use.

I hope you enjoyed this article! As always, if you have any questions, feel free to ask on twitter, our discord server, or the subreddit. Cheers!

- Evan Ovadia

24

At any given time in a program, the vast majority of pointers are owning pointers, which require no generation.

Appendix: "Generation Tables" Option

Vale doesn't have this option yet, stay tuned!

The random generational references approach above is Vale's default, but we'll also have this "generation tables" approach as another option for users, in case random generational references don't work for them for some reason. 25

In this approach, an object doesn't have a 64 bit generation, it instead has a 32-bit entryIndex into the "generation table".

A generational reference contains three things:

  • object: Pointer to the object.
  • index: 32-bit index into the generation table.
  • rememberedGeneration: 32-bit generation number.

Every region will have a generation table, a single struct which contains:

  • entries: Pointer to an "entries" array.
  • size: How many elements in the entries array.
  • firstFree: Index of the first free entry (this is the head of the entry free-list).

An entry has two things:

  • currentGeneration: 32-bit generation for the object corresponding to this entry.
  • nextFree 32-bit index to the next free entry, if this entry is currently unused.

To do a generation check for a given generational reference, we look up the indexth entry in the table, and see if its currentGeneration matches the reference's rememberedGeneration.

When the user creates a Vale object, we:

  • Expand the generation table if there are no entries left.
  • Set the new object's entryIndex to the table's firstFree index.
  • Set the table's firstFree index to this entry's nextFree.

When the user destroys a Vale object, we:

  • Increment the entryIndexth entry's currentGeneration. If it hits INT_MAX, stop here.
  • Set the entry's nextFree to the table's firstFree.
  • Set the table's firstFree to the object's entryIndex.
25

One reason we're keeping the generation tables approach around is as an opt-in Spectre mitigation, similar to Go's -spectre flag.

Spectre attacks aren't any more likely in Vale and still require very specific conditions (see Golang's explanation), but if there's a Spectre attack plus an extant use-after-free plus attacker-controlled input, they could possibly follow up their Spectre attack with an RCE.

It's unknown if this is actually possible, but until we can rule it out, we'll work towards having this alternate option.