Single ownership is one of those concepts that's both easier and more powerful than we realize.
People often think it's complex, because it most often appears in languages that are already complex for separate reasons.
Let's dispel that myth, and figure out what it really is!
Even if you already know what single ownership is, you'll probably find some interesting surprises:
Below, I'll explain single ownership from a C foundation, and then we'll see the weird things it can do.
With manual memory management, we usually make it clear who's responsible for eventually freeing certain memory, via documentation or naming.
For example, strdup will return a heap-allocated buffer, which the caller then owns. From GLib's strdup documentation:
So we make sure to eventually deliver this data to another function that destroys it.
If we don't do that, we get a memory leak; the memory is unusable for the rest of the program's run.
void main() {
char* myBuffer = GetRealPath("./myfile.txt");
// prints: /Users/Valerian/Bogglewoggle/myfile.txt
printf("Real path: %s\n", myBuffer);
// uh oh, we forgot to free(myBuffer)!
// Memory leak.
}
If we accidentally do it multiple times, the program might crash or exhibit undefined behavior.
void main() {
char* myBuffer = GetRealPath("./myfile.txt");
// prints: /Users/Valerian/Bogglewoggle/myfile.txt
printf("Real path: %s\n", myBuffer);
free(myBuffer);
// Shenanigans ensue!
free(myBuffer);
}
As any C programmer knows, we carefully track who owns the data, all the way from when it's created, to when it's finally destroyed.
At any given time in between, we can generally identify who "owns" certain data, whether it be a certain local variable or a field in some struct somewhere.
Of course, other pointers to the data can exist, they just don't own the data. We have a mental distinction between owning pointers and non-owning pointers.
If you've ever implemented a balancing binary search tree like a red-black tree or an AVL tree, recall that a parent conceptually has "owning" pointers to its children, and its children have non-owning pointers back to their parents.
Single ownership isn't just for pointers and malloc and free, it's to anything that we have a future responsibility for.
For example, pthread_create creates a handle that we're responsible for eventually pthread_destroying. We generalize this a bit more later, but for now let's just think about heap allocations.
When I was a wee lad, I would suffix my "owning" pointers with _owning to keep things straight.
Here's a balancing binary search tree node in regular C...
struct Node {
struct Node* parent;
struct Node* leftChild;
struct Node* rightChild;
};
...and here it is with some helpful _ownings added to the names.
struct Node {
struct Node* parent;
struct Node* leftChild_owning;
struct Node* rightChild_owning;
};
I had some sensible guidelines:
Rule 1: Anything that comes from malloc must be put in a variable ending in _owning.
Rule 2: We can't let an _owning variable (or field) go out of scope, the only way to destroy it is to explicitly give it to free (or "move" it, which we'll talk about later).
Rule 3: We can't use that variable after freeing it (or moving it).
We could even make a linter or compiler to enforce this, and detect both memory leaks and double-frees:
void main() {
char* myBuffer_owning = GetRealPath("./myfile.txt");
// prints: /Users/Valerian/Bogglewoggle/myfile.txt
printf("Real path: %s\n", myBuffer_owning);
// Take out this line for an error!
free(myBuffer_owning)
// Or add this bad line for a different error!
// printf("Freed: %s\n", myBuffer_owning);
}
In a way, these rules help us track responsibility for freeing the buffer.
Keep the phrase "tracking responsibility" in mind, we'll build on that later to make the system help us with much more than managing memory.
If we were to craft a C-like language, then instead of using a suffix like char* myBuffer_owning, we might use a symbol on the type, like: char^ myBuffer.
When you look at it that way, it's kind of like char, char^, and char* are all different types:
The above char* myBuffer_owning = ... would become char^ myBuffer = ....
However, for the compiler to be able to keep things straight, we also need to add the ability to move ownership from one variable into another variable or field:
void main() {
char^ myBuffer = GetRealPath("./myfile.txt");
// prints: /Users/Valerian/Bogglewoggle/myfile.txt
printf("Real path: %s\n", myBuffer);
char^ otherVar = move myBuffer;
// Now, we can't access myBuffer.
free(move otherVar)
}
Notice how we moved from myBuffer to otherVar. We also moved into the free call as well.
This move keyword comes with a new rule.
Rule 4: When we move out of a variable, we can't use it anymore.
We would also use this new move keyword to transfer ownership of some data to our caller, like so:
Spaceship^ makeSpaceship() {
Spaceship^ ship = (Spaceship^)malloc(sizeof(Spaceship));
ship->fuel = 42;
return move ship;
}
These rules are sometimes known as move semantics.
C++ and Rust also have move semantics, but ours is a little different so far:
Of course, we're not doing any of that.
Let's see what interesting things happen!
Recall Rule 2: The user can't let an owning variable (or field) go out of scope, the only way to destroy it is to explicitly move it or give it to free.
Because of this, if you accidentally let an owning pointer go out of scope, you'll get a compiler error.
You can use this to your advantage, using a technique called "Higher RAII", which is similar to linear types. 0
Basically, if you want your caller to remember to do something later, give them ownership of a "reminder" object that they can't accidentally drop. 1
This technique is incredibly powerful. With it, the compiler can enforce that you:
More generally, it means the compiler can ensure you remember to do something in the future and don't accidentally forget it.
Higher RAII is similar to linear typing. An owning reference is a linear type, but when we can also make non-owning references to the same object, we get Higher RAII.
"Drop" here means to let the compiler clean it up with some default action.
A Promise is an object that will deliver a result to a corresponding Future, sometimes on a different thread.
Lets say we have a spaceship game.
Some facts about our game's code:
If we forget to remove it from the display cache, we'd get an odd bug when displaying.
Let's use single ownership to prevent this bug at compile-time. No mainstream language prevents this kind of bug, but we're about to do it easily!
We already have a function that will add the ship to the display cache, which looks like this:
void AddShipToDisplayCache(DisplayCache* cache, Ship* ship) {
... // Add to cache
}
Let's instead return a "reminder" object: 3
struct ShipInCacheReminder { };
ShipInCacheReminder^ AddShipToDisplayCache(DisplayCache* cache, Ship* ship) {
... // Add to cache
// Return a zero-sized reminder object
return (ShipInCacheReminder^)malloc(sizeof(ShipInCacheReminder));
}
The compiler prevents us from accidentally dropping this ShipInCacheReminder^; we must explicitly move or free it. That makes it a very effective reminder.
The only place we'll ever free it is in the RemoveShipFromDisplayCache function: 4
void RemoveShipFromDisplayCache(
DisplayCache* cache,
Ship* ship,
ShipInCacheReminder^ reminder) {
free(move reminder);
... // Remove it from the cache
}
Voilà! The compiler now guarantees that we'll not forget to RemoveShipFromDisplayCache.
Let's see it more in context, and get a clearer picture of how this is used.
Personally, I prefer storing the reminder next to the Ship^ itself. For example, instead of a central list of Ship^, we might have a central list of ShipAndMetadata^ which contains a Ship^ and all the reminder objects for it.
struct ShipAndMetadata {
Ship^ ship;
ShipInCacheReminder^ shipInCache;
};
That way, when we take apart the ShipAndMetadata struct to get to the Ship we want to destroy, we're naturally left holding a ShipInCacheReminder.
Here, we try to delete the shipAndMetadata directly, and justifiably get an error.
void DestroyShip(
ShipAndMetadata^ shipAndMetadata) {
// Error: freed a struct without
// moving data from it
free(move shipAndMetadata);
}
The compiler knows we need to move fields out before freeing the container, so let's move things out first. 5
We also free ship here. However, the shipInCache was not dealt with, so the compiler notices that and throws an error.
void DestroyShip(
ShipAndMetadata^ shipAndMetadata) {
Ship^ ship = move shipAndMetadata.ship;
ShipInCacheReminder^ shipInCache =
move shipAndMetadata.shipInCache;
free(move shipAndMetadata);
free(move ship);
// Error: Un-destroyed data:
// ShipInCacheReminder^ shipInCache
}
That's good! We don't want the compiler to leak or even free that reminder automatically, that would defeat the purpose.
To fix it, we'll call RemoveShipFromDisplayCache:
void DestroyShip(
DisplayCache* cache,
ShipAndMetadata^ shipAndMetadata) {
Ship^ ship = move shipAndMetadata.ship;
ShipInCacheReminder^ shipInCache = move shipAndMetadata.shipInCache;
free(move shipAndMetadata);
// Gets rid of the shipInCache
RemoveShipFromDisplayCache(cache, &ship, move shipInCache);
// Gets rid of the ship
free(move ship);
// No errors!
}
As a bonus, this technique prevents us from accidentally doing RemoveShipFromDisplayCache twice, because there's only one reminder object instance. 6
They say the biggest two problems in computer science are off-by-one errors and cache invalidation. We just solved the latter!
If you want to learn more about this, check out Vale's Higher RAII, the pattern that saved me a vital 5 hours in the 7DRL Challenge, where I used this exact technique to remember to remove things from a cache.
This ShipInCacheReminder^ could even just be a ShipInCacheReminder if the compiler knows to not drop ShipInCacheReminders, such as how in Vale it's decided on a per-type basis.
We could get around this system by maually freeing it elsewhere. We'd have to go out of our way to cause that bug, so it's not as worrisome.
This is so common thing that languages like Vale can just "move-destructure" to do this, e.g. [ship, shipInCache] = shipAndMetadata;.
So far, this technique doesn't prevent us from removing the wrong Ship. If we want, we could prevent that by putting the Ship's ID into the ShipInCacheReminder.
Let's make something else amazing happen.
We can make our little C-like language completely memory safe with a few extra rules:
Rule 5: Always bounds-check array accesses.
Rule 6: No using pointer arithmetic.
Rule 7: Never use non-owning pointers. Each object only ever has one pointer to it, the owning pointer.
You're probably thinking, "That's insane! How can you make a program without non-owning pointers? And how does that make our programs memory safe?"
It does sound insane! But we can refactor any program to not use non-owning pointers, if we have these guidelines:
If you want to read more about this "linear style", check out last week's article on how Vale taught me about all of this. Vale's Linear-Aliasing Model is about using linear style, and falling back to generational references when the user chooses not to use it.
If you've ever used a language with a linear type system like Austral, this will sound familiar.
After a few hours of using this style, it actually feels pretty similar to Rust. The first guideline is semantically equivalent to passing an &mut, and we use the second gudeline all the time in Rust already.
If we want to add non-owning pointers back in while keeping our newfound memory safety, there are a few ways we can go about it.
Memory tagging is when every 16-byte chunk of memory has a corresponding 4-bit "key" that goes with it. Every time we allocate or deallocate that chunk, we change the chunk's key.
Whenever we make a pointer to some data, we copy that key into the unused top bits of the pointer.
When we attempt to dereference the pointer, the hardware will assert that the key in the chunk matches the key in the pointer.
Even though these asserts have a 1/16 chance of a false negative, this is great for detecting bugs in development and testing.
CHERI is still in the R&D phase, but it seems like it will solve a lot of memory tagging's downsides.
Of course, there's always Rust's borrow checking approach! It enforces that if someone has a mutable reference to an object, no other references can exist. This simple rule gives us temporary non-owning pointers with full memory safety.
It has some really nice benefits:
But also some drawbacks:
I would love to see a language that has both linear types and borrow checking. Austral is looking pretty promising here. Rust gets close, but its affine types can't get the specific compile-time guarantees of linear types and higher RAII. 7 8
All types in Rust must be affine, in other words all types must have a zero-arg no-return drop method defined, so that Rust can automatically drop any type when it goes out of scope. This conflicts with higher RAII, which is all about requiring explicit destruction of certain types.
Vale's generational references is like a 64-bit memory tagging system, but it uses the language's semantics to skip as many assertions as the user wants. 9
It has a some benefits:
It has its drawbacks too, of course:
This combination of generational references and regions is my favorite approach, especially since it still lets us use Higher RAII.
There are a few other approaches we can use to add non-owning pointers back in while preserving memory safety:
...but this post is getting quite large already! 10
Specifically, regions eliminate the vast majority, and we can use linear style to eliminate the rest.
One day I'd really like to write a series on the surprising number of mechanisms for memory safety. We often think it's just reference counting, tracing garbage collection, and borrow checking, but it turns out there are a lot more.
Single ownership is a lot simpler and more powerful than people realize. Even without reference counting, garbage collection, or borrow checking, we can use it for memory safety and extra guarantees at compile-time.
I hope languages will start moving in this direction. C++ pioneered single ownership with unique_ptr and Rust added memory safety, but both are affine 11 12 so don't get higher RAII's compile-time guarantees.
Austral uses linear types for its memory safety, and built their entire language around them to great effect. They wrote up a really great Introduction to Linear Types too, definitely worth checking out. 13
Vale has both linear and affine types, plus generational references on top. The user might decide that Ships are affine and ShipInCacheReminders are linear, and can have references to either of them. We also just successfully used regions to eliminate most of generational references' downsides, though it's still quite experimental.
Next week I'll be writing about how we can use these techniques (and a few others) to design some memory safety for C++, so keep an eye on the RSS feed, twitter, discord server, or subreddit!
See you next time!
- Evan Ovadia
Affine means the compiler must always be able to drop data automatically when it goes out of scope.
I was hoping that by using = delete; on a constructor in C++ we could get a linear type. Alas, no such luck.
Austral's creator was the first person I talked to when I realized all of this!