LanguagesArchitecture

Single ownership is one of those concepts that's both easier and more powerful than we realize.

People often think it's complex, because it most often appears in languages that are already complex for separate reasons.

Let's dispel that myth, and figure out what it really is!

Even if you already know what single ownership is, you'll probably find some interesting surprises:

  • You don't need borrow checking for memory-safe single ownership.
  • You can use it to enforce all sorts of compile-time guarantees.
  • You can blend it with other memory management techniques!

Below, I'll explain single ownership from a C foundation, and then we'll see the weird things it can do.

We often track single ownership manually in C

With manual memory management, we usually make it clear who's responsible for eventually freeing certain memory, via documentation or naming.

For example, strdup will return a heap-allocated buffer, which the caller then owns. From GLib's strdup documentation:

The caller of the function takes ownership of the data, and is responsible for freeing it.

So we make sure to eventually deliver this data to another function that destroys it.

If we don't do that, we get a memory leak; the memory is unusable for the rest of the program's run.

c
void main() {
  char* myBuffer = GetRealPath("./myfile.txt");

  // prints: /Users/Valerian/Bogglewoggle/myfile.txt
  printf("Real path: %s\n", myBuffer);

  // uh oh, we forgot to free(myBuffer)!
  // Memory leak.
}

If we accidentally do it multiple times, the program might crash or exhibit undefined behavior.

c
void main() {
  char* myBuffer = GetRealPath("./myfile.txt");

  // prints: /Users/Valerian/Bogglewoggle/myfile.txt
  printf("Real path: %s\n", myBuffer);

  free(myBuffer);
  // Shenanigans ensue!
  free(myBuffer);
}

As any C programmer knows, we carefully track who owns the data, all the way from when it's created, to when it's finally destroyed.

At any given time in between, we can generally identify who "owns" certain data, whether it be a certain local variable or a field in some struct somewhere.

Of course, other pointers to the data can exist, they just don't own the data. We have a mental distinction between owning pointers and non-owning pointers.

If you've ever implemented a balancing binary search tree like a red-black tree or an AVL tree, recall that a parent conceptually has "owning" pointers to its children, and its children have non-owning pointers back to their parents.

Single ownership isn't just for pointers and malloc and free, it's to anything that we have a future responsibility for.

For example, pthread_create creates a handle that we're responsible for eventually pthread_destroying. We generalize this a bit more later, but for now let's just think about heap allocations.

How might we track it?

When I was a wee lad, I would suffix my "owning" pointers with _owning to keep things straight.

Here's a balancing binary search tree node in regular C...

c
struct Node {
  struct Node* parent;
  struct Node* leftChild;
  struct Node* rightChild;
};

...and here it is with some helpful _ownings added to the names.

c
struct Node {
  struct Node* parent;
  struct Node* leftChild_owning;
  struct Node* rightChild_owning;
};

I had some sensible guidelines:

Rule 1: Anything that comes from malloc must be put in a variable ending in _owning.

Rule 2: We can't let an _owning variable (or field) go out of scope, the only way to destroy it is to explicitly give it to free (or "move" it, which we'll talk about later).

Rule 3: We can't use that variable after freeing it (or moving it).

We could even make a linter or compiler to enforce this, and detect both memory leaks and double-frees:

c
void main() {
  char* myBuffer_owning = GetRealPath("./myfile.txt");

  // prints: /Users/Valerian/Bogglewoggle/myfile.txt
  printf("Real path: %s\n", myBuffer_owning);

  // Take out this line for an error!
  free(myBuffer_owning)

  // Or add this bad line for a different error!
  // printf("Freed: %s\n", myBuffer_owning);
}

In a way, these rules help us track responsibility for freeing the buffer.

Keep the phrase "tracking responsibility" in mind, we'll build on that later to make the system help us with much more than managing memory.

How a compiler can track it

If we were to craft a C-like language, then instead of using a suffix like char* myBuffer_owning, we might use a symbol on the type, like: char^ myBuffer.

When you look at it that way, it's kind of like char, char^, and char* are all different types:

  • char is an (owned) value, owned by the containing stack frame, struct, or array.
  • char^ is an owning pointer to something in the heap.
  • char* is a non-owning pointer.

The above char* myBuffer_owning = ... would become char^ myBuffer = ....

However, for the compiler to be able to keep things straight, we also need to add the ability to move ownership from one variable into another variable or field:

c
void main() {
  char^ myBuffer = GetRealPath("./myfile.txt");

  // prints: /Users/Valerian/Bogglewoggle/myfile.txt
  printf("Real path: %s\n", myBuffer);

  char^ otherVar = move myBuffer;
  // Now, we can't access myBuffer.

  free(move otherVar)
}

Notice how we moved from myBuffer to otherVar. We also moved into the free call as well.

This move keyword comes with a new rule.

Rule 4: When we move out of a variable, we can't use it anymore.

We would also use this new move keyword to transfer ownership of some data to our caller, like so:

c
Spaceship^ makeSpaceship() {
  Spaceship^ ship = (Spaceship^)malloc(sizeof(Spaceship));
  ship->fuel = 42;
  return move ship;
}

(now watch as this new syntax completely confuses the site's code highlighter)

These rules are sometimes known as move semantics.

Let's keep it weird

C++ and Rust also have move semantics, but ours is a little different so far:

  • C++ doesn't have rule 4, in C++ the compiler lets us accidentally use the variable after we move out of it.
  • Rust adds a borrow checker on top. We're going to do something else!
  • Both C++ and Rust will automatically free an owning pointer when it goes out of scope.

Of course, we're not doing any of that.

Let's see what interesting things happen!

Higher RAII, and compile-time guarantees

Recall Rule 2: The user can't let an owning variable (or field) go out of scope, the only way to destroy it is to explicitly move it or give it to free.

Because of this, if you accidentally let an owning pointer go out of scope, you'll get a compiler error.

You can use this to your advantage, using a technique called "Higher RAII", which is similar to linear types. 0

Basically, if you want your caller to remember to do something later, give them ownership of a "reminder" object that they can't accidentally drop. 1

This technique is incredibly powerful. With it, the compiler can enforce that you:

  • Remember to fulfill a Promise 2 exactly once.
  • Remember to remove something from a cache that you previously added to.
  • Remember to explicitly rollback or commit a transaction.
  • Remember to join a thread and do something with the thread function's result.
  • Remember to explicitly force-close a TCP connection or wait for a graceful close.

More generally, it means the compiler can ensure you remember to do something in the future and don't accidentally forget it.

0

Higher RAII is similar to linear typing. An owning reference is a linear type, but when we can also make non-owning references to the same object, we get Higher RAII.

1

"Drop" here means to let the compiler clean it up with some default action.

2

A Promise is an object that will deliver a result to a corresponding Future, sometimes on a different thread.

An Example

Lets say we have a spaceship game.

Some facts about our game's code:

  • All of our spaceships live in a central list.
  • We also need to have a separate cache for displaying them.
  • Every ship needs to be in both the central list and the display cache.
  • When the ship dies, we need to remove it from both the central list and the display cache.

If we forget to remove it from the display cache, we'd get an odd bug when displaying.

Let's use single ownership to prevent this bug at compile-time. No mainstream language prevents this kind of bug, but we're about to do it easily!

We already have a function that will add the ship to the display cache, which looks like this:

c
void AddShipToDisplayCache(DisplayCache* cache, Ship* ship) {
  ... // Add to cache
}

Let's instead return a "reminder" object: 3

c
struct ShipInCacheReminder { };

ShipInCacheReminder^ AddShipToDisplayCache(DisplayCache* cache, Ship* ship) {
  ... // Add to cache
  // Return a zero-sized reminder object
  return (ShipInCacheReminder^)malloc(sizeof(ShipInCacheReminder));
}

The compiler prevents us from accidentally dropping this ShipInCacheReminder^; we must explicitly move or free it. That makes it a very effective reminder.

The only place we'll ever free it is in the RemoveShipFromDisplayCache function: 4

c
void RemoveShipFromDisplayCache(
    DisplayCache* cache,
    Ship* ship,
    ShipInCacheReminder^ reminder) {
  free(move reminder);
  ... // Remove it from the cache
}

Voilà! The compiler now guarantees that we'll not forget to RemoveShipFromDisplayCache.

Let's see it more in context, and get a clearer picture of how this is used.

Personally, I prefer storing the reminder next to the Ship^ itself. For example, instead of a central list of Ship^, we might have a central list of ShipAndMetadata^ which contains a Ship^ and all the reminder objects for it.

c
struct ShipAndMetadata {
  Ship^ ship;
  ShipInCacheReminder^ shipInCache;
};

That way, when we take apart the ShipAndMetadata struct to get to the Ship we want to destroy, we're naturally left holding a ShipInCacheReminder.

Here, we try to delete the shipAndMetadata directly, and justifiably get an error.

c
void DestroyShip(
    ShipAndMetadata^ shipAndMetadata) {
  // Error: freed a struct without
  //   moving data from it
  free(move shipAndMetadata);
}

The compiler knows we need to move fields out before freeing the container, so let's move things out first. 5

We also free ship here. However, the shipInCache was not dealt with, so the compiler notices that and throws an error.

c
void DestroyShip(
    ShipAndMetadata^ shipAndMetadata) {
  Ship^ ship = move shipAndMetadata.ship;
  ShipInCacheReminder^ shipInCache =
      move shipAndMetadata.shipInCache;
  free(move shipAndMetadata);

  free(move ship);
  // Error: Un-destroyed data:
  //   ShipInCacheReminder^ shipInCache
}

That's good! We don't want the compiler to leak or even free that reminder automatically, that would defeat the purpose.

To fix it, we'll call RemoveShipFromDisplayCache:

c
void DestroyShip(
    DisplayCache* cache,
    ShipAndMetadata^ shipAndMetadata) {
  Ship^ ship = move shipAndMetadata.ship;
  ShipInCacheReminder^ shipInCache = move shipAndMetadata.shipInCache;
  free(move shipAndMetadata);

  // Gets rid of the shipInCache
  RemoveShipFromDisplayCache(cache, &ship, move shipInCache);

  // Gets rid of the ship
  free(move ship);

  // No errors!
}

As a bonus, this technique prevents us from accidentally doing RemoveShipFromDisplayCache twice, because there's only one reminder object instance. 6

They say the biggest two problems in computer science are off-by-one errors and cache invalidation. We just solved the latter!

If you want to learn more about this, check out Vale's Higher RAII, the pattern that saved me a vital 5 hours in the 7DRL Challenge, where I used this exact technique to remember to remove things from a cache.

3

This ShipInCacheReminder^ could even just be a ShipInCacheReminder if the compiler knows to not drop ShipInCacheReminders, such as how in Vale it's decided on a per-type basis.

4

We could get around this system by maually freeing it elsewhere. We'd have to go out of our way to cause that bug, so it's not as worrisome.

5

This is so common thing that languages like Vale can just "move-destructure" to do this, e.g. [ship, shipInCache] = shipAndMetadata;.

6

So far, this technique doesn't prevent us from removing the wrong Ship. If we want, we could prevent that by putting the Ship's ID into the ShipInCacheReminder.

Memory Safety

Let's make something else amazing happen.

We can make our little C-like language completely memory safe with a few extra rules:

Rule 5: Always bounds-check array accesses.

Rule 6: No using pointer arithmetic.

Rule 7: Never use non-owning pointers. Each object only ever has one pointer to it, the owning pointer.

You're probably thinking, "That's insane! How can you make a program without non-owning pointers? And how does that make our programs memory safe?"

It does sound insane! But we can refactor any program to not use non-owning pointers, if we have these guidelines:

  1. Any function that receives a non-owning pointer would instead take (and return) ownership of data.
  2. Any struct that stores a non-owning pointer would instead store an index (or ID) into some central data structure.

If you want to read more about this "linear style", check out last week's article on how Vale taught me about all of this. Vale's Linear-Aliasing Model is about using linear style, and falling back to generational references when the user chooses not to use it.

If you've ever used a language with a linear type system like Austral, this will sound familiar.

After a few hours of using this style, it actually feels pretty similar to Rust. The first guideline is semantically equivalent to passing an &mut, and we use the second gudeline all the time in Rust already.

Adding non-owning pointers back in safely

If we want to add non-owning pointers back in while keeping our newfound memory safety, there are a few ways we can go about it.

Memory Tagging and CHERI

Memory tagging is when every 16-byte chunk of memory has a corresponding 4-bit "key" that goes with it. Every time we allocate or deallocate that chunk, we change the chunk's key.

Whenever we make a pointer to some data, we copy that key into the unused top bits of the pointer.

When we attempt to dereference the pointer, the hardware will assert that the key in the chunk matches the key in the pointer.

Even though these asserts have a 1/16 chance of a false negative, this is great for detecting bugs in development and testing.

CHERI is still in the R&D phase, but it seems like it will solve a lot of memory tagging's downsides.

Borrow Checking

Of course, there's always Rust's borrow checking approach! It enforces that if someone has a mutable reference to an object, no other references can exist. This simple rule gives us temporary non-owning pointers with full memory safety.

It has some really nice benefits:

  • With a little bit of function coloring and data coloring, we can temporarily share data with multiple threads, without risking data races!
  • It doesn't have the run-time cost of memory tagging or CHERI (about 6.8%).

But also some drawbacks:

  • It doesn't let us use common patterns like observers, graphs, back-references, dependency references, intrusive data structures, delegates, and struggles with certain kinds of RAII and interconnected data.
  • It can be difficult for time management and development velocity.

I would love to see a language that has both linear types and borrow checking. Austral is looking pretty promising here. Rust gets close, but its affine types can't get the specific compile-time guarantees of linear types and higher RAII. 7 8

7

All types in Rust must be affine, in other words all types must have a zero-arg no-return drop method defined, so that Rust can automatically drop any type when it goes out of scope. This conflicts with higher RAII, which is all about requiring explicit destruction of certain types.

8

Some folks have been thinking about what it would take for Rust to do full linear types. It sounds pretty promising, and I hope it happens!

Generational References

Vale's generational references is like a 64-bit memory tagging system, but it uses the language's semantics to skip as many assertions as the user wants. 9

It has a some benefits:

  • It skips the assertions when we access an owning pointer, or when reading from an immutable region. Between these, one can eliminate every single assertion in a program if they want.
  • Its owning references are regular thin 64-bit pointers, as opposed to CHERI's 128-bit ones.
  • It's easy, and lets us use whatever patterns we want.
  • It can be used with regions to eliminate data races.

It has its drawbacks too, of course:

  • It has a "key" at the top of every allocation, though structs and enums share the key of the containing struct, array, or stack frame to save space.
  • Its non-owning references are 128 bits (like CHERI's), though they can be avoided with the above linear style.
  • When we choose to dereference a generational reference, it does a check at run-time similar to a bounds check.

This combination of generational references and regions is my favorite approach, especially since it still lets us use Higher RAII.

Other Approaches

There are a few other approaches we can use to add non-owning pointers back in while preserving memory safety:

...but this post is getting quite large already! 10

9

Specifically, regions eliminate the vast majority, and we can use linear style to eliminate the rest.

10

One day I'd really like to write a series on the surprising number of mechanisms for memory safety. We often think it's just reference counting, tracing garbage collection, and borrow checking, but it turns out there are a lot more.

What does it all mean?

Single ownership is a lot simpler and more powerful than people realize. Even without reference counting, garbage collection, or borrow checking, we can use it for memory safety and extra guarantees at compile-time.

I hope languages will start moving in this direction. C++ pioneered single ownership with unique_ptr and Rust added memory safety, but both are affine 11 12 so don't get higher RAII's compile-time guarantees.

Austral uses linear types for its memory safety, and built their entire language around them to great effect. They wrote up a really great Introduction to Linear Types too, definitely worth checking out. 13

Vale has both linear and affine types, plus generational references on top. The user might decide that Ships are affine and ShipInCacheReminders are linear, and can have references to either of them. We also just successfully used regions to eliminate most of generational references' downsides, though it's still quite experimental.

Next week I'll be writing about how we can use these techniques (and a few others) to design some memory safety for C++, so keep an eye on the RSS feed, twitter, discord server, or subreddit!

See you next time!

- Evan Ovadia

11

Affine means the compiler must always be able to drop data automatically when it goes out of scope.

12

I was hoping that by using = delete; on a constructor in C++ we could get a linear type. Alas, no such luck.

13

Austral's creator was the first person I talked to when I realized all of this!