Vale's hybrid-generational memory is a new memory model that combines all the best parts of existing memory strategies: it's as easy as garbage collection, as deterministic as reference counting, and as fast as borrow checking. 0
Note that hybrid-generational-memory is not implemented yet, it's still just a design.
There are three ingredients to make hybrid-generational memory work:
Hybrid-generational memory is built upon generational references. Recall:
Vale has three release modes:
Resilient mode uses hybrid-generational memory.
u48 means a 48-bit unsigned integer.
We chose 48 bits, but we could push it as high as 60 bits if we adjusted the below inlining mechanisms. 48 bits is more than enough though.
And a u16 offset to know where the generation is relative to the object, see generational references.
Use static analysis to reduce the number of liveness checks as much as possible. For example:
This static analysis only works when a nearby local holds the owning reference. The scope tethering explained further below will make it work with non-owning locals too.
The above static analysis only worked when a nearby local holds the owning reference. Now we'll make it work when a nearby local holds a non-owning reference too.
We'll add a u1 "tethered" bit to every allocation, next to the u48 generation number. A local with a non-owning reference can set this bit to 1 to keep its allocation alive. 4 Inside the scope of the local, we can skip all generation checks.
Not every non-owning local will tether. Static analysis will make a non-owning tether when it's dereferenced several times. Otherwise, it will just allow the generation checks to happen.
Someone letting go of the object's owning reference will still call its destructor, regardless of the tethered bit. If the tethered bit is 1, the destructor will not free the object. Instead, the last tethering local will free the object.
Loading from null is a memory safe operation: it's guaranteed to correctly seg-fault if we load from it.
The old tethered bit will usually be 0, but if another local is tethering the object, it could be 1 already.
Specifically, every time we allocate, we check the front of the queue to see if something's tether has expired, and if so, reuse that object. If not, move it to the back of the queue and ask generational malloc instead. Similar to a free-list!
That's basically it! There are some more things we could do to speed it up even more, using virtual memory, regions, or more static analysis, but we'll stop the explanation here.
To address some frequently asked questions:
Similar to how Pony scans all incoming and outgoing objects.
Some potential weaknesses to explore:
Presumably, we would make every generational reference have a pointer to the object, and a target generation number, and a pointer to the current generation. The LLVM pass would eliminate the latter.
Every mainstream OS has virtual memory, but WASM does not.
One day, we could write a compactor for Vale which could also help this, though its probably unnecessary.
Vale is a high-level language with zero unsafe, so an apples-to-apples comparison would be with a Rust where the only unsafe is in the standard library.
Memory safety is never free, except for the most trivial of programs. Cyclone, Rust, ATS, Fortran, and every other language incurs some overhead to ensure safety. This usually comes in the form of branching, cache misses, and extra memory usage, see Beyond Rust: Innovations in Safety, Speed, and Flexibility for more.
Rust programs uses borrow references where they can. Where the borrow checker gets in their way, they incur costs:
This is often amortized, but often not, if a generational array is temporary.
Hybrid-generational memory also has some speed costs, depending on how it's using a particular reference:
Note that not all Rust code uses Rc or RefCell or generational indices. Sometimes, a piece of code's requirements are such that they can use Cell (which is free) or just Vecs (which still have bounds checking).
Setting the tethered bit is generally free because loading the generation brought the tethered bit into the cache.
The borrow checker tends to force Rust code to overuse Vecs (even when Vecs iteration benefits don't apply 14), which incur rather massive memory costs; a Vec will use up to 2x as much memory as the most elements the Vec has ever had.
Hybrid-generational memory doesn't rely on expanding arrays. However, the generational stack does keep ten stacks, each which contains the maximum amount of memory it's ever had.
RefCell, and generational indices use 8b per object, and Rc uses 16. Hybrid-generational memory uses 8b per object. Though once again, it should be noted that not all Rust code uses Rc or RefCell or generational indices.
Iterating over Vecs is very fast, if the Vec is large enough for the CPU to pre-fetch the next elements from the cache, which happens after ~50 elements.
Rust has its borrow checker and Vale has its region borrow checker. Both allow the language to know when something's immutable, to optimize away redundant loads. Additionally, Vale lets us specify structs and interfaces as deeply immutable, which does the same thing without any borrow checking.
Rust's borrow checker often forces us to use Vecs, and Vale's regions allow us to use pool or arena allocation. Both enable incredibly fast allocation. Vale's regions can apply it to any type, whereas in Rust we need to specifically add and keep track of every Vec, and some Rust allocations can't be put into Vecs. 15
Regions make hybrid-generational memory even faster, because inside a pool or arena region, an object doesn't need a generation at all.
Rust's borrow checker is more difficult 16. Rust also forces the user to use the borrow checker uniformly across the entire program, whereas Vale's region borrow checking is opt-in.
In the end, this could mean:
For example, Rcd objects are put on the heap.
Rust's borrow checker enforces mutability xor aliasability, while Vale's region borrow checker doesn't need that restriction.
Hybrid-generational memory looks very promising, and could have Rust-like speed, while being much easier to learn and use.
Of course, this is all just theories until we see some benchmarking. Unfortunately, it's impossible to get a meaningful benchmark yet, firstly because hybrid-generational memory isn't implemented yet, and because Rust has ten years of unrelated optimizations that would confound a meaningful comparison.
Once we implement hybrid-generational memory fully, we'll benchmark it against Rust, and then optimize until it catches up.
Time will tell how hybrid-generational memory compares with the borrow checker!