Vale has an ambitious goal: to be fast, safe, and easy. There are a lot of stellar languages that have two, and we suspect it's possible to really maximize all three using a new approach for zero-cost memory safety called regions.
If you're unfamiliar, check out our high-level overview of regions. This article specifically covers the first core concept, called immutable region borrowing.
Regions let us immutably borrow data to unleash some powerful optimizations, to make Vale extremely fast.
Immutable region borrowing is when we inform the compiler that, during this scope, 2 we know we won't be changing a certain region of data.
A "region" of data might be:
The compiler can then help enforce that promise, and use it to eliminate generation checks, the only source of overhead for the generational references aproach (more on this below).
It also has secondary optimization effects:
In this article, we'll show you how it's all possible!
If anything isn't clear, feel free to reach out via discord, twitter, or the subreddit! We love answering questions, and it helps us know how to improve our explanations.
We're aiming to complete regions by early 2024, check out the roadmap for more details.
A "scope" is the duration of a function or a block inside a function. It can also be the lifetime of the containing object, the time between its construction and destruction.
An iso object is an object where anything it indirectly owns will only ever have references to something else indirectly owned by that original object.
Regions can eliminate memory safety overhead for impure functions 4 and for groups of data marked iso, but let's start with the simplest example, a pure function.
For example, this snippet of code previously had 20 memory-safety-related operations:
But when regions are added to the language, that drops to zero.
Most functions are impure. A function is only pure if it doesn't modify any data that existed before the call.
Let's rewrite the above snippet to clarify where the 20 memory-safety-related operations used to happen:
These operations are called "generation checks", and they make our program memory-safe.
What are they, and how do regions eliminate them?
Vale doesn't have reference counting or garbage collection. It uses a more efficient method called generational references.
To summarize generational references: Each allocation 5 has a "current generation number". Each non-owning reference is made of a pointer to the object, plus the "remembered generation number" of the object at the time the reference was created.
Whenever we dereference a non-owning reference, we do a "generation check." It's as if the reference is saying:
and the person who opens the door says:
As you can imagine, we have to do generation checks whenever we suspect that the target object might have disappeared since the time we made the reference.
However, with immutable borrowing, the compiler can track which objects were alive before the call, and skip generation checks on them.
An allocation can contain multiple objects, so not every object has a generation.
This will halt the program. This is pretty rare in practice, and might become even rarer as we more eagerly detect problems before they happen, as part of Hybrid-Generational Memory.
Here's the above snippet again:
Vale knows that we don't have to do these generation checks. Before we dive in, here's a high-level overview of why:
The above is very high-level. Let's dive in and clarify!
This can either be determined by seeing the still-alive owning reference, or by doing a single generation check before the function.
The List<Ship> is a list of owning references. Nobody can destroy a Ship while the List<Ship> still has its owning reference.
In #1 above, we mention that the caller will determine whether the List
In the following example, the callerFuncA function adds some Ships to a list, and then passes it to printShipNames.
The Vale compiler sees these two facts:
It then creates a raw reference 9 (without a generation) pointing to ship, and passes it to printShipNames.
To reiterate, printShipNames takes a raw reference for its tenShips parameter, not a generational reference.
The user never has to know about the difference between raw references and generational references. This lets us keep the language simple, and decouple optimization from logic concerns.
You can almost think of a "raw reference" as just a regular pointer. Under the hood, it sometimes also contains an offset integer that can later be used to recover the generation. This raw reference is sometimes 128 bits wide, and sometimes 64 bits, depending on the object and the CPU (we can use top-byte-ignore on ARM CPUs, and might put a offset-to-generation integer in there, though thats only a tentative design).
The above was the easy case. We knew myShips was alive because we had an owning reference. But what if we have a non-owning reference?
Note that Vale often can know whether a non-owning reference is alive, via static analysis or hybrid-generational memory. But sometimes, it might not.
In the following example, the callerFuncB function has a non-owning reference, which is a generational reference under the hood.
However, a non-owning references into an immutable region is not a generational reference, it is actually just a raw reference 10 under the hood, which may or may not be null. 11 12
To turn a generational reference (like theirShips) into a raw pointer, we do a generation pre-check. We compare the generational reference to the object it points at, and:
We then hand this raw reference to printShipNames, in this example.
So why a raw reference, instead of a generational reference?
Recall that Vale will detect any use-after-free by doing a generation check, and halt the program. The same thing happens when dereferencing null raw references. 13 So, dereferencing a null raw reference is equivalent to dereferencing a generational reference.
However, it's faster to dereference a raw reference than a generational reference, so we prefer to use raw references when we can. We know we can use raw references inside pure functions because we know the target object won't be destroyed during the call, so we can trust the raw reference to accurately represent whether the object is alive.
Summarizing how a non-pure function calls a pure function:
Recall from above notes, this raw reference might not be a raw pointer; it might have an offset-to-generation integer with it.
We use nullable raw pointers, just as our fathers did, and their fathers before them! But don't worry, Vale has no nulls, it's only an implementation detail.
There are extra considerations here for embedded devices where NULL is a valid memory address. On those, we'd supply a known invalid address, such as 0xFFFFFFFF00000000, or the address of some reserved, unmapped virtual address space. There are also language implications about the sizes of large non-array objects, feel free to swing by the discord server to learn more!
The former is caught by the generated code itself, and the latter is caught by the CPU's own protections.
Recall that printShipNames was pure.
The compiler enforces that data from before the pure function is immutable, at least while we're still in the call.
For example, if we tried to put tenShips.add(Ship(42)); in the function, the compiler would reject it.
The compiler tracks which variables point into that immutable region. It knows that whenever we dereference an immutable object, its members will also be immutable.
It can help to imagine a "region ID" that travels along with the pointer, but at compile time.
Above, it knew that tenShips points to an immutable region, so therefore tenShips[i] is also in an immutable region, and therefore ship and name point to data in an immutable region too.
The user doesn't have to annotate which regions data came from, but it can be helpful for clarity. The following example shows the printShipNames with the optional explicit region markers.
Note that even though pre-existing memory is in an immutable region, the compiler still allows us to change anything created since then.
Here's the same example, with the foreach loop expanded, to illustrate that i is changing.
In other words, a pure function can change anything except the memory that came before the call.
Keep in mind, these region markers are often inferred by the compiler, and usually not seen in Vale code.
That covers it! That's how the compiler knows ship and name are immutable, and i is mutable.
This tracking is called region tracking, because it tracks the region each piece of data comes from, and checks that we don't modify immutable regions.
We could also write &r'List<Ship> instead, the r' applies deeply.
We could also write &r'List<Ship> instead, the r' applies deeply.
On first glance, immutable region borrowing looks similar to traditional borrow checking seen in Rust, Cyclone, and a couple other languages. However, there are a few fundamental differences:
Regions are more similar to mechanisms like:
Vale's contribution to the broader regions endeavor is in showing how:
And with luck, we can bring all of this goodness into mainstream programming!
RAII is about automatically affecting the world outside our object. To affect the outside world, the borrow checker often requires us to take a &mut parameter or return a value, but we can't change drop's signature. To see this in action, try to make a handle that automatically removes something from a central collection. Under the hood we usually use unsafe mechanisms, including FFI.
We've only covered one basic example, showing how pure functions use regions to eliminate memory safety overhead.
There are a lot of other ways regions help:
Regions don't just help performance, they also enable some novel features:
There are some surprising details which aren't covered in this article:
That's all for now! We hope you enjoyed this article. Stay tuned for the next article, which shows how isolated sub-regions work.
If you're impressed with our track record and believe in the direction we're heading, please consider sponsoring us on GitHub!
With your support, we can bring regions to programmers worldwide.
See you next time!
- Evan Ovadia
This is because in Vale, pure doesn't actually mean we can't access globals, it more means "freeze this thread's memory".
This is a pretty light operation on likely cache-hot memory, so it's not too bad. It can also be avoided by returning data in an iso.
Vale aims to bring a new way of programming into the world that offers speed, safety, and ease of use.
The world needs something like this! Currently, most programming language work is in:
These are useful, but there is a vast field of possibilities in between, waiting to be explored!
Our aim is to explore that space, discover what it has to offer, and make speed and safety easier than ever before.
In this quest, we've discovered and implemented a lot of new techniques:
These techniques have also opened up some new emergent possibilities, which we hope to implement:
We also gain a lot of inspiration from other languages, and are finding new ways to combine their techniques:
...plus a lot more interesting ideas to explore!
The Vale programming language is a novel combination of ideas from the research world and original innovations. Our goal is to publish our techniques, even the ones that couldn't fit in Vale, so that the world as a whole can benefit from our work here, not just those who use Vale.
Our medium-term goals:
We aim to publish articles biweekly on all of these topics, and create and inspire the next generation of fast, safe, and easy programming languages.
If you want to support our work, please consider sponsoring us on GitHub!
With enough sponsorship, we can: