A few years ago, a 2am realization changed the course of my life forever, and eventually turned me into a nomadic hermit 0 and language designer.
The realization was that we could use regions to eliminate memory safety overhead when we access data. It's something weird and new, like borrow checking without the reference restrictions, or perhaps more like if C++'s const and D's pure had a demonic hellspawn lovechild. 1
Last time, I talked about how the first step towards implementing regions was to make the compiler's backend 2 aware of multiple regions. After that, it was just another three weeks to make the first feature that used regions under the hood: Fearless FFI.
Then the real battle began.
Or rather a "digital nomad" as it's said these days. The details of how Vale made me follow the winds is a ridiculous story worthy of another entire blog post.
My friend told me to take this line out, but all my beta readers told me to keep it in.
The "backend" is the part that takes Vale's representation of the code, and converts it into a form that LLVM is able to turn into binary code. I suppose one could be said LLVM is the real backend, but I consider LLVM to be more the back-back-end, I suppose.
Regions are pretty simple to use. For example, a function that doesn't modify any arguments can just add the pure keyword for a massive speedup.
Under the hood though, some pretty complex things have to happen. The compiler needs to transform the above function into something like this:
If you're curious, here's what just happened:
In other words, it assigns a region to every type. List<int> becomes r'List<r'int>, to know that the list and its contents both came into the pure function. Check out Immutable Region Borrowing for more on this! 7
Before we enable multiple regions, 8 the first step is to make every function receive a single, universal region that all data will live in, which we'll call m'.
needs to become this:
I thought so too!
One would normally use a foreach here, but we're showing a more primitive function so the upcoming transform is a little clearer.
'T means an "isolated" T. It means nothing points at the T, nothing inside the T points out, and nothing outside the T points in. See this explanation for more. Anything returned from a pure function is automatically isolated. In this case, 'int is later simplified to just int since it's a value type.
Vale automatically "isolates" anything that a pure function returns, so that it can be merged into the caller's region. This iso function can sometimes have a run-time cost for more complex data structures, but that can be avoided if the user manually assembles an isolate instead.
So far, this is similar to the borrow checking rules found in Rust, Cyclone, and Cone's designs. However, Vale applies those rules to entire regions, not just individual pieces of data.
This is already intricate, and we aren't even at the more advanced region features like custom isolates, one-way isolation, multi-region data, or region-scoped data! I try not to think about that too much.
We do have a prototype of multi-region functions working, actually! We have a few more articles about regions, the prototype, and its results queued up.
Feel free to email me if you'd like an early look. Our sponsors also get everything early, so consider sponsoring!
I thought that we only ever actually create one region, inside main, because that's the only "exported" function, in other words, function that's callable from C.
Above, when main called thrice, the compiler instantiated (a.k.a. "monomorphized" or "specialized") 9 a version of it called thrice<m'>.
Similarly, if main also calls another function fice, a fice<m'> would be created. 10
As you can imagine, there would be a <m'> version of every function in the codebase, which is weird, but fine.
Except there's one problem... what if main isn't the only exported function?
Let's say we added another function bar, and exported it so that C could call into it.
Since it's exported, it will also create a region, which we'll call b'.
Now, since both main and bar are calling both thrice and fice, that means we're making four instantiations.
In other words, we'll not only have a thrice<m'> and fice<m'> but also a thrice<b'> and fice<b'>!
If we had ten exported functions, we might have ten versions of thrice and fice created. Our executable file could be up to ten times as large as it needs to be!
It's especially ironic, because thrice<m'>, thrice<b'> and all these other versions of thrice are exactly the same. 11
This means to make a copy of it with all the generic arguments (like T) filled in with actual types.
"Fice" means to multiply something by five, I promise.
And I don't think LLVM is able to notice that and merge these functions back together. Even if it does, it would use up a lot of compile time, which is a big detriment to development velocity.
One solution was to make one global region (perhaps called g') that every type was in. This would technically work, but it wouldn't get us any closer to the eventual goal of using multiple regions at once, like in the original SumList function.
The other solution was something I've been putting off for a long time: full generics.
At the start of this, Vale didn't have full generics (like Java), it had templates (like C++).
Let's say we have this template function in Vale:
Vale doesn't type-check that function up-front.
Instead, it waits for other functions to call it, like so:
Then, Vale can instantiate the sum function, to generate a version of it with int instead of T, like this:
Then it type-checks it, including making sure that there's a int + int operation available. 12
This identical to how C++ works, under the hood. Or if you've used C, it's similar to how preprocessor macros work. Specifically, it does substitutions first, then type-checks after.
That's how a templating system works. Generics are a bit different.
When a compiler uses generics under the hood, it does it in the reverse order:
Or more specifically, it makes sure that there's a func +(int, int) available.
If the compiler uses generics, then we can do some region-related things after type-checking the functions, but before instantiating the functions.
Specifically, the compiler would:
By removing all traces of regions, step 4 won't instantiate a different thrice<m'> and a thrice<spork'>. Only a single thrice will be generated, solving our problem.
Generics are much harder to implement than templates.
I thought it would only take a few weeks at most. By the end, it had taken 81 days and 23,588 lines of rather intense coding.
When I'm coding something particularly challenging, I have to get out my pencil and paper 14 and just start scribbling notes and thoughts until something starts to make sense. Usually this happens once or twice per feature... thirty pages of my notebook are about generics. It was brutal.
The internet often pokes fun at Go for taking ten years to add their generics. Honestly, after this endeavor, I don't blame them one bit. I kind of want to find the folks on the Go team and comisserate together over some Delirious Blue. 15 Anyone who looked over would see two grown engineers crying over their drinks, saying weird words like "polymorphism" and "interface bounds" over and over.
The problem is that I'm just an engineer with some skills in data structures and architecture. I'm not a type theorist, and this more mathematical abstract thinking doesn't come naturally to me. When implementing generics, we're taking abstract thinking to a whole new level: we're type-checking functions that don't exist yet, conjuring "placeholder" types, and putting bounds information in places that I didn't even know existed.
I also want to give a shout-out to the amazing people in my server who helped me get through the journey. Huge thanks to Jon Goodwin (of Cone), Arthur Weagel, Ivo Balbaert, RazzBerry, zikzak323, Zodey, keeslinp, mikey, IsaacPaul, kurtkuehnert, DestyNova, librarianmage, devast8a, and cristian.vasile! I'm not sure I could have finished without their support and positive energy. You all are awesome!
I actually called it a gorram nightmare but I promised myself that in 2023 I wouldn't put any more Firefly references into my articles.
Elegant weapons, for a more civilized age.
We also named of one of our test cases after this drink.
Now that generics is finally finished, I can resume progress with regions!
The next step is to make it so every function is aware that everything is in the same region.
Once we make this function:
...into this function:
...then the compiler will be "region-aware". After that, we can start playing with multi-region functions and the real fun can begin!
That's all for now! I hope you enjoyed this article. Keep an eye out for the next one on our RSS feed, twitter, discord server, or subreddit!
If you're impressed with our track record and believe in the direction we're heading, please consider sponsoring us on GitHub!
With your support, we can bring regions to programmers worldwide.
See you next time!
- Evan Ovadia
Vale aims to bring a new way of programming into the world that offers speed, safety, and ease of use.
The world needs something like this! Currently, most programming language work is in:
These are useful, but there is a vast field of possibilities in between, waiting to be explored!
Our aim is to explore that space, discover what it has to offer, and make speed and safety easier than ever before.
In this quest, we've discovered and implemented a lot of new techniques:
These techniques have also opened up some new emergent possibilities, which we hope to implement:
We also gain a lot of inspiration from other languages, and are finding new ways to combine their techniques:
...plus a lot more interesting ideas to explore!
The Vale programming language is a novel combination of ideas from the research world and original innovations. Our goal is to publish our techniques, even the ones that couldn't fit in Vale, so that the world as a whole can benefit from our work here, not just those who use Vale.
Our medium-term goals:
We aim to publish articles biweekly on all of these topics, and create and inspire the next generation of fast, safe, and easy programming languages.
If you want to support our work, please consider sponsoring us on GitHub!
With enough sponsorship, we can: