LanguagesArchitecture

A few years ago, a 2am realization changed the course of my life forever, and eventually turned me into a nomadic hermit 0 and language designer.

The realization was that we could use regions to eliminate memory safety overhead when we access data. It's something weird and new, like borrow checking without the reference restrictions, or perhaps more like if C++'s const and D's pure had a demonic hellspawn lovechild. 1

Last time, I talked about how the first step towards implementing regions was to make the compiler's backend 2 aware of multiple regions. After that, it was just another three weeks to make the first feature that used regions under the hood: Fearless FFI.

Then the real battle began.

Side Notes
(interesting tangential thoughts)
0

Or rather a "digital nomad" as it's said these days. The details of how Vale made me follow the winds is a ridiculous story worthy of another entire blog post.

1

My friend told me to take this line out, but all my beta readers told me to keep it in.

2

The "backend" is the part that takes Vale's representation of the code, and converts it into a form that LLVM is able to turn into binary code. I suppose one could be said LLVM is the real backend, but I consider LLVM to be more the back-back-end, I suppose.

Adding Regions to the Frontend

Regions are pretty simple to use. For example, a function that doesn't modify any arguments can just add the pure keyword for a massive speedup.

vale
pure func SumList(list &List<int>) int {
sum = 0;
i = 0;
while i < list.size() { 3
set sum = sum + list.get(i);
}

return sum;

}

Under the hood though, some pretty complex things have to happen. The compiler needs to transform the above function into something like this:

vale
func SumList<r' imm, x' rw>(list &r'List<r'int>) 'int { 4
sum x'int = 0;
i x'int = 0;
while i < list.size() {
set sum = sum + list.get(i);
}

return iso(sum);
5
}

If you're curious, here's what just happened:

  • The <r' imm, x' rw> declares two regions:
    • An immutable region r' for any arguments given to the pure function. 6
    • A read-write region x' for anything created inside the function.
  • It notes that all parameters came from the immutable region r'.
  • It notes that all data created inside the function is in the read-write region x'.

In other words, it assigns a region to every type. List<int> becomes r'List<r'int>, to know that the list and its contents both came into the pure function. Check out Immutable Region Borrowing for more on this! 7

Before we enable multiple regions, 8 the first step is to make every function receive a single, universal region that all data will live in, which we'll call m'.

This function:

vale
func thrice(a int) int {
result = a + a + a;
return result;

}

needs to become this:

vale
func thrice<m' rw>(a m'int) m'int {
result m'int = a + a + a;
return result;

}

Simple, right?

I thought so too!

3

One would normally use a foreach here, but we're showing a more primitive function so the upcoming transform is a little clearer.

4

'T means an "isolated" T. It means nothing points at the T, nothing inside the T points out, and nothing outside the T points in. See this explanation for more. Anything returned from a pure function is automatically isolated. In this case, 'int is later simplified to just int since it's a value type.

5

Vale automatically "isolates" anything that a pure function returns, so that it can be merged into the caller's region. This iso function can sometimes have a run-time cost for more complex data structures, but that can be avoided if the user manually assembles an isolate instead.

6

So far, this is similar to the borrow checking rules found in Rust, Cyclone, and Cone's designs. However, Vale applies those rules to entire regions, not just individual pieces of data.

7

This is already intricate, and we aren't even at the more advanced region features like custom isolates, one-way isolation, multi-region data, or region-scoped data! I try not to think about that too much.

8

We do have a prototype of multi-region functions working, actually! We have a few more articles about regions, the prototype, and its results queued up.

Feel free to email me if you'd like an early look.

The Unexpected Quest

I thought that we only ever actually create one region, inside main, because that's the only "exported" function, in other words, function that's callable from C.

Above, when main called thrice, the compiler instantiated (a.k.a. "monomorphized" or "specialized") 9 a version of it called thrice<m'>.

Similarly, if main also calls another function fice, a fice<m'> would be created. 10

As you can imagine, there would be a <m'> version of every function in the codebase, which is weird, but fine.

vale
func thrice(a int) int {
return a + a + a;
}

func fice(a int) int {
result = a + a + a + a + a;
return result;

}

// Makes m'
exported func main() {
// Makes thrice<m'>
a = thrice(5);
println(a);

}

Except there's one problem... what if main isn't the only exported function?

Let's say we added another function bar, and exported it so that C could call into it.

Since it's exported, it will also create a region, which we'll call b'.

Now, since both main and bar are calling both thrice and fice, that means we're making four instantiations.

In other words, we'll not only have a thrice<m'> and fice<m'> but also a thrice<b'> and fice<b'>!

vale
func thrice(a int) int {
return a + a + a;
}

func fice(a int) int {
result = a + a + a + a + a;
return result;

}

// Makes m'
exported func main() {
// Makes thrice<m'>
a = thrice(5);
// Makes fice<m'>
b = fice(a);
println(b);

}

// Makes b'
exported func bar(a int) int {
// Makes thrice<b'>
a = thrice(5);
// Makes fice<b'>
b = fice(a);
return b;

}

If we had ten exported functions, we might have ten versions of thrice and fice created. Our executable file could be up to ten times as large as it needs to be!

It's especially ironic, because thrice<m'>, thrice<b'> and all these other versions of thrice are exactly the same. 11

9

This means to make a copy of it with all the generic arguments (like T) filled in with actual types.

10

"Fice" means to multiply something by five, I promise.

11

And I don't think LLVM is able to notice that and merge these functions back together. Even if it does, it would use up a lot of compile time, which is a big detriment to development velocity.

How do we solve this?

One solution was to make one global region (perhaps called g') that every type was in. This would technically work, but it wouldn't get us any closer to the eventual goal of using multiple regions at once, like in the original SumList function.

The other solution was something I've been putting off for a long time: full generics.

Templates vs Generics

At the start of this, Vale didn't have full generics (like Java), it had templates (like C++).

Let's say we have this template function in Vale:

vale
func sum<T>(a T, b T) T {
return a + b;
}

Vale doesn't type-check that function up-front.

Instead, it waits for other functions to call it, like so:

vale
func bork() int {
return sum<int>(4, 5);
}

Then, Vale can instantiate the sum function, to generate a version of it with int instead of T, like this:

vale
func sum<int>(a int, b int) int {
return a + b;
}

Then it type-checks it, including making sure that there's a int + int operation available. 12

This identical to how C++ works, under the hood. Or if you've used C, it's similar to how preprocessor macros work. Specifically, it does substitutions first, then type-checks after.

That's how a templating system works. Generics are a bit different.

When a compiler uses generics under the hood, it does it in the reverse order:

  1. It type-checks the function first, before it knows what T really is.
  2. Later, after the entire program has been type-checked, it instantiates the different versions of the functions.
12

Or more specifically, it makes sure that there's a func +(int, int) available.

How would generics help?

If the compiler uses generics, then we can do some region-related things after type-checking the functions, but before instantiating the functions.

Specifically, the compiler would:

  1. Type-check all functions.
  2. Region-check all functions; make sure that the user doesn't mix up data from two regions.
  3. Remove all traces of regions. For example:
    1. It would turn func thrice<x' rw>(...) into func thrice(...).
    2. Instead of requesting a thrice<m'> and a thrice<spork'>, callers would only request a simple thrice.
  4. Instantiate the functions.

By removing all traces of regions, step 4 won't instantiate a different thrice<m'> and a thrice<spork'>. Only a single thrice will be generated, solving our problem.

Generics are a Nightmare 13

Generics are much harder to implement than templates.

I thought it would only take a few weeks at most. By the end, it had taken 81 days and 23,588 lines of rather intense coding.

When I'm coding something particularly challenging, I have to get out my pencil and paper 14 and just start scribbling notes and thoughts until something starts to make sense. Usually this happens once or twice per feature... thirty pages of my notebook are about generics. It was brutal.

The internet often pokes fun at Go for taking ten years to add their generics. Honestly, after this endeavor, I don't blame them one bit. I kind of want to find the folks on the Go team and comisserate together over some Delirious Blue. 15 Anyone who looked over would see two grown engineers crying over their drinks, saying weird words like "polymorphism" and "interface bounds" over and over.

The problem is that I'm just an engineer with some skills in data structures and architecture. I'm not a type theorist, and this more mathematical abstract thinking doesn't come naturally to me. When implementing generics, we're taking abstract thinking to a whole new level: we're type-checking functions that don't exist yet, conjuring "placeholder" types, and putting bounds information in places that I didn't even know existed.

I also want to give a shout-out to the amazing people in my server who helped me get through the journey. Huge thanks to Jon Goodwin (of Cone), Arthur Weagel, Ivo Balbaert, RazzBerry, zikzak323, Zodey, keeslinp, mikey, IsaacPaul, kurtkuehnert, DestyNova, librarianmage, devast8a, and cristian.vasile! I'm not sure I could have finished without their support and positive energy. You all are awesome!

13

I actually called it a gorram nightmare but I promised myself that in 2023 I wouldn't put any more Firefly references into my articles.

14

Elegant weapons, for a more civilized age.

15

We also named of one of our test cases after this drink.

Next Steps

Now that generics is finally finished, I can resume progress with regions!

The next step is to make it so every function is aware that everything is in the same region.

Once we make this function:

vale
func thrice(a int) int {
result = a + a + a;
return result;

}

...into this function:

vale
func thrice<x' rw>(a x'int) x'int {
result x'int = a + a + a;
return result;

}

...then the compiler will be "region-aware". After that, we can start playing with multi-region functions and the real fun can begin!

That's all for now! I hope you enjoyed this article. Keep an eye out for the next one on our RSS feed, twitter, discord server, or subreddit!

See you next time!

- Evan Ovadia