Vale has an ambitious goal: to be fast, safe, and easy. There are a lot of stellar languages that have two, and we suspect it's possible to really have all three.
This isn't quite possible with today's more well-known approaches. Languages with borrow checking have great run-time speed, and languages with garbage collection have better flexibility, simplicity, and development velocity, but we haven't yet seen a way to get all these benefits at once.
However, by blending two new techniques, we can make something with the best of all worlds: fast and safe, while keeping the language simple and flexible. 2
We can make it even faster by combining it with a new technique called regions. 4
This article is a summary of the five-part series explaining regions, check that out if you want more details!
We're aiming to complete regions by early 2024, check out the roadmap for more details.
If we're lucky, the blend could have as little memory overhead as borrow checking in practice: zero-cost temporarily immutable references, with the occasional generation check for non-temporary references (which, in borrow checking, usually become indexes or IDs which incur bounds checking or hashing costs on use).
It also enables some fascinating features like Higher RAII!
Vale's memory safety comes from generational references which are pretty fast: they only incur between 2% and 10.84% speed overhead in our benchmarked program, a terrain generator. This alone is fast enough for most programs, while still being easy to use. 5
One should generally keep a program loose and flexible, and then profile it to find the small sections of their code that would benefit from optimization.
That's where regions enter the picture.
A region is a grouping of data that we can freeze and unfreeze as one, to make it either "mutable" or "immutable".
While a region is immutable it's much faster to read from, because the compiler no longer needs to insert any generation checks to ensure that a generational reference still points to a valid object.
We have three tools to put data into regions:
These are explained more throughout the rest of the article.
Regions are opt-in; one can write their entire program without knowing about regions at all.
Note that there's only one benchmark program so far, and we're comparing Vale without generational references (zero overhead, equivalent to C) to Vale with generational references. We'll be able to compare more directly to other languages once we support inline data, hopefully in mid-2023. Still, these results are particularly promising, considering Java's 89% speed overhead, Go's 183%, and Swift's 320% measured in this article.
In any program, many functions just read data from the world and calculate something new.
We call these functions "pure".
A Vale programmer 6 can find these and annotate them with the pure keyword, like in this function.
Simply adding pure will inform the compiler that the List, all of its Ships, plus ship, all have immutable data.
Adding the pure keyword eliminates every single generation check in this function, making its memory safety zero-cost. 7
How does this work?
pure functions see their parameters as all inside an immutable region, separate from its own internal region. 8
Here's that snippet again, where everything from the immutable region is bolded.
printShipNames's own data (such as i) is in its own private mutable region.
This is only a high-level taste, check out Immutable Region Borrowing for more on how this works!
It's pretty easy to do this for any function that only reads from its parameters. And in practice, functions can often be adjusted to be read-only, so that they can add the pure keyword too.
There is a restriction, however. We can only hand immutable data (like the above List and Ships) to functions that are also pure or have region markers as shown in the next section.
...or even static analysis tools, in theory!
This happens surprisingly often, a lot of functions can use regions to eliminate every single generation check. It isn't a universal phenomenon though, any non-trivial stateful program is guaranteed to have some memory safety overhead, no matter what language or paradigm you use.
pure functions have a hidden feature: they can temporarily see multiple regions as one combined immutable region. This makes them much more flexible and reusable!
Pure functions, like the above, automatically know which data was in an immutable region.
However, if we want to hand this immutable data to another (non-pure) function, we sometimes need to add region markers to it. 9
Region markers are fairly simple to use, and don't come with any complicated concepts or aliasing restrictions. 10
Here, we just add <r'> (or any other letter) after the function's name, and in front of any type that's from the immutable region.
The compiler uses these region markers to make sure that:
For more on this, check out Immutable Region Borrowing.
Before we talk about the other ways to put data into regions, let's talk about the motivating principles behind regions!
We can also hand immutable data to generic functions, such as List<T>'s contains, without adding any region markers. This is because the compiler makes no assumptions about what region that T might be in; it's prepared for T to be in a separate region, so there's no need to add region markers. One could say that T comes with its own implicit region markers.
In Vale, we can freely alias regions.
For example, x' and y' could refer to the same mutable region, even if there's a read-only region r' referring to it. We can even use a pure function to then temporarily flip all those to immutable.
This all works because any function that receives a read-only region will automatically be monomorphized into two functions in the final binary: one that receives a mutable region (and does generation checks), and a faster one that receives an immutable region.
Many situations fit regions' restrictions easily and seamlessly, such as:
Theoretically, with enough effort, any code can be phrased in terms of regions. However, when a language forces the user into a certain paradigm, it tends to have a steeper learning curve and a lot more complexity.
Instead, Vale has a core principle of avoiding forced restrictions, so that it can stay simple, flexible, and easy to learn.
For this reason, Vale's regions are opt-in. In fact, one can write an entire Vale program without knowing about regions. 11
These programs still benefit from regions, of course: the standard library (and many other libraries) use regions under the hood, and the compiler will automatically use regions for iteration and other situations where part of the data is read-only.
Since regions are opt-in, one only needs to know about regions for the parts of their code they want to make even faster, usually after their profiler identifies the specific sections of their code that would benefit from optimization.
This is also consistent with another Vale core principle, progressive disclosure, where things are simple by default. The user can progressively discover the more complex features as they need them, rather than needing to know them all at once. 12
The last principle behind regions is ignorable complexity. If a user sees any pure or ', they can ignore them and still understand the code, because regions don't change the semantics of the program. In a way, they're just an optimization hint that can be safely ignored. 13
We need languages that are simple, safe, readable, and fast by default, with any extra restrictions being opt-in and ignorable. These principles are what originally motivated Vale. 14
This is very important, not only because it makes Vale easier to learn, but because most code in practice should optimize for simplicity and flexibility, not speed. On top of that, most programs don't need to optimize away generation checks because they're pretty fast already. One should optimize away quadratic algorithms first, and they might even find that their program is fast enough at that point.
This is a principle I hope more languages pick up. C# does this well with its struct keyword; you can completely ignore the feature until you want it.
In fact, our test interpreter does ignore them!
In general, we should focus on making programming better for humans. Balancing that with performance is a fascinating design challenge!
Often, the vast majority of a program's generation checks can be eliminated with pure functions. If we want to go even further, we can use isolates.
An isolate is just a hierarchy of data where nothing inside points out, and nothing outside points in except for the owning reference. 15
An isolate forms a region that is mutable by default, and can be frozen and unfrozen independently of the rest of the program. 16
We use ' to specify something is an isolate, and .imm to temporarily access that data to read it immutably.
In this example, we isolate the cannon by putting a ' in front of the Cannon call.
Then, to immutably borrow the data, we use cannon.imm.
We can also use .open to open it for read-write access. The compiler makes sure that we don't use .open and .imm at the same time on the same region.
Lastly, we can use .read, which gives us read-only access while allowing it to be .opened at the same time, though it doesn't have the performance benefits of .imm. 17
Note that isolates are a little more restricted when inside structs. We can avoid those restrictions by using a cell, like ''Cannon, which moves the .imm-versus-.open check to run-time.
Check out Isolates for more!
We actually can have references inside pointing out, as shown in the next section.
Someone once described this as "a borrow checker you can precisely enable and disable," which is a pretty apt description, though regions don't impose the same aliasability-xor-mutability restrictions as borrow checking.
And since Vale data has no aliasing restrictions, it can do all the useful patterns like intrusive data structures, graphs, observers, back-references, dependency references, callbacks, delegates, and more kinds of RAII plus Higher RAII!
One would usually use .read in non-performance-critical code or when they aren't sure whether someone else is .opening it at the same time.
Most isolates have two rules:
That second rule is optional, an isolate can contain references that point outward.
One way to do this is for a struct to have a region parameter.
This Ship can still be in an isolate, even though it contains a reference to a Dock in a different region.
In practice, this lets us put a lot more data into isolates. Many structs can put all of their private data into an isolate.
We also get this for free if we use generics. The compiler assumes that the T in List<T> is in a different region, as if it had an implicit region parameter.
Check out One-way Isolation for more on this!
There are a lot of interesting aspects in the five-part series that aren't covered here, such as:
Plus, every global is in a mutex, so every global has a isolate.
Collections do this. For example, we can have a List<x'Ship>, where each List node literally contains and owns some data from region x'.
Actual syntax TBD.
We saw three ways to put data in regions, to specify which data can be temporarily immutable:
Regions have some particular strengths:
To compare, today's existing approaches are fairly good, but they either:
Blending regions with generational references is simple, safe, and fast by default, and also lets us eliminate the last bit of memory safety overhead where it's needed.
For example, in functional programming, everything is immutable by default, and we often emulate mutability with tools like the state monad, which tends to spread to any callers, and all their callers, and so on.
Borrow checking has this weakness too: aliasing restrictions often bleed through API boundaries and cause extra refactoring throughout the codebase, and we can only partially trade those restrictions for others in certain cases.
Both paradigms are great in a lot of situations, and can often result in unnecessary complexity in areas that don't benefit from those restrictions.
This is referring to approaches like reference counting and tracing garbage collection.
As we saw, regions let us temporarily freeze specific data in our program, to let us immutably access it to make our code faster.
If you want to learn more, check out the more in-depth series:
With your support, we can bring regions to programmers worldwide!
See you next time,
- Evan Ovadia
Vale aims to bring a new way of programming into the world that offers speed, safety, and ease of use.
The world needs something like this! Currently, most programming language work is in:
These are useful, but there is a vast field of possibilities in between, waiting to be explored!
Our aim is to explore that space, discover what it has to offer, and make speed and safety easier than ever before.
In this quest, we've discovered and implemented a lot of new techniques:
These techniques have also opened up some new emergent possibilities, which we hope to implement:
We also gain a lot of inspiration from other languages, and are finding new ways to combine their techniques:
...plus a lot more interesting ideas to explore!
The Vale programming language is a novel combination of ideas from the research world and original innovations. Our goal is to publish our techniques, even the ones that couldn't fit in Vale, so that the world as a whole can benefit from our work here, not just those who use Vale.
Our medium-term goals:
We aim to publish articles biweekly on all of these topics, and create and inspire the next generation of fast, safe, and easy programming languages.
If you want to support our work, please consider sponsoring us on GitHub!
With enough sponsorship, we can: