Every March, developers worldwide each try to make a roguelike game in less than 168 hours as part of the 7DRL Challenge. Roguelike games are hard, involving procedural level generation, pathfinding, line-of-sight algorithms, and more dijkstra maps than you can imagine.
Most people don't survive the week. 0
Years of 7DRL challenges has taught me that development velocity is the most important thing to optimize for. It doesn't matter how perfect your code is if it doesn't make it into the hands of the players in time.
A few weeks ago I wrote the first part of our little expedition, which explored the various memory safety approaches. Today, let's talk about how they might help or harm development velocity!
You might learn something new, like:
Developer velocity is important. Not just for delivering a complete game in 7 days, but in a lot of every day software engineering situations too:
Some napkin math to illustrate that last one:
As you can see, software development can be much more expensive than power usage, so it can make sense to optimize for development velocity more. 3 4
The choice of language, and its memory safety approaches, is a big factor in development velocity.
There are generally four approaches to memory safety:
There's also a fifth approach, generational references with regions which we'll talk about elsewhere; this series is comparing the more traditional approaches.
But by the end of this, you'll have a much better idea of which is best for a particular situation, whether it be a game development challenge, web server, or anything else.
Development velocity is a nebulous concept, but I would say it's how fast we can expand, modify, and maintain our codebase to deliver value to the user and not cause too much collateral damage.
The "deliver value to the user" is very important here. It means doing something that will help the user in a specific way. Without that clear goal, we can get caught up in making our features absolutely perfect according to some arbitrary artistic criteria. As anyone who has launched a product can tell you, it's better to have two solid, tested, flexible features instead of one absolutely perfect one.
The "too much" is also important. There is often a tradeoff between moving fast and keeping things absolutely correct. A big part of software engineering is weighing the value of new features against the risk and relative severity of any bugs that might slip into production for a while.
With that in mind, let's see how the various approaches do!
People that come to C from higher-level languages often think that they should code like Java but manually inserting malloc and free calls. Let's call this style "naive C".
One can become pretty skilled in mentally tracking where these calls should go, but the approach tends to fall apart after a while: it isn't resilient to changes and refactoring. It's common to accidentally change the code and break some previous assumptions, resulting in a memory bug.
Memory bugs are notoriously difficult to solve. Naive C doesn't have great development velocity.
...and that's why experienced C developers don't really use this "naive C" style of coding.
Instead, they:
With these, developer velocity can be stellar even with an MMM language. Modern MMM languages even include these kinds of mechanisms, such as Zig's Release-Safe mode.
There is a cost to these memory approaches. Address Sanitizer increases run-time by 73%. This makes it almost as slow as garbage collection. 6
However, that doesn't matter if we can just enable it in debug mode to get the error detection improvements for developer velocity and turn them off in release mode, just like assertions. As we talked about in Part 1, we wouldn't want this for a public-facing server, but it's a great tradeoff for many games and settings like webapps and mobile apps where any security issues can be sandboxed away. 7
Google Earth proved that this strategy can work well. In an average Google Earth quarter, only 3-5% of bug reports were traceable to memory safety problems, and Address Sanitizer made them trivial to reproduce in development mode.
Another developer velocity benefit from MMM languages is that they aim to be as simple as possible to keep compile times low. Languages like C, Zig, and Odin tend to compile much faster than more complex languages like Scala and Rust. This greatly helps developer velocity. 8 On top of that, Zig's simplicity helps its ability to hot-reload code changes which could make it one of the fastest native languages to develop in.
By this I mean, most people don't have a finished game by the end. But they still celebrate with the rest of us, after an honorable struggle!
We also need developer velocity to develop fast enough to compensate for product managers that wildly underestimate how long it will take to make something.
Just to keep it simple, let's not include benefits or bonuses, which make it even higher.
This of course varies by company and team; a startup will have much higher development costs, and a company like Oracle would probably have more server costs.
Note that there are many aspects of development velocity that aren't related to memory safety, and also many uses of electricity that aren't doing garbage collection or reference counting.
By "garbage collection" I'm specifically referring to tracing garbage collection.
From a TheNewStack benchmark:
This works particularly well for apps that only talk to a trusted first-party server, which includes most apps. It doesn't work as well for programs where clients indirectly send each other data, such as multiplayer first person shooter games.
I can speak from experience; every time I have a project that takes more than five seconds to compile, I tend to get distracted by Reddit or something shiny.
There are a few approaches on the horizon that improve upon MMM even more.
To address more of the memory unsafety in MMM languages, CHERI detects any memory safety problems at run-time using capabilities.
From Microsoft Security Response Center:
One could reason that CHERI can reduce memory-unsafety related development slowdowns by two thirds, which is pretty incredible. Arm CPUs are even starting to have hardware support for it, bringing its run-time overhead down to 6.8%. 9
That remaining one third could be solved by Vale which takes this approach even further in the form of its generational references. By building it into the language itself, it can know exactly when it does or doesnt need to perform a check, and by adding native region support it can theoretically skip the vast majority of them, bringing its run-time overhead even closer to zero, within the noise of any C or C++ program. With that kind of speed, we can leave the protections enabled even in release mode.
Borrow checking is another mechanism that can be added to a low-level language to protect against memory problems, similar to SPARK. Rust is the main language with borrow checking, though there are some newer languages that are designing borrow checkers with better development velocity.
Its main advantages are that:
In a lot of situations, Rust can be pretty stellar for development velocity. There are plenty of folks (such as here and here) which praise Rust's velocity, especially against languages like C and Python.
But that's not that surprising; neither have as strong of a static type system or the generics that almost every modern GC'd and MMM 10 language has. 11 So how does borrow checking fare against more modern languages and practices?
It's a tricky question. The above benefits sometimes make it better, but there are also some pretty considerable drawbacks that make it much slower than languages with garbage collection and reference counting. And depending on the situation's requirements and the tooling and techniques used, it can sometimes be even slower than MMM.
This isn't an uncommon opinion. From Using Rust at a startup: A cautionary tale:
And in the words of another software architect:
We often write it off as just a learning curve problem, but it's apparently true even for more experienced rustaceans.
So what's the problem here? What is it about the borrow checker that slows down development velocity?
To summarize, it's because:
These are rather tricky concepts, so let's explore these a little more.
I'll also link to some quotes and examples (green to emphasize that they're just anecdotes) to complete the picture. 13 14
There will be more quotes about the drawbacks, because initial readers found these aspects more surprising; there aren't very many articles that explore this side of the borrow checker. There are less quotes for the borrow checker's benefits, which everyone largely agrees on.
Also keep in mind that these are mostly comparing borrow checking to garbage collection.
Lastly, note that this doesn't mean borrow checking is a bad thing. It just means that to get its benefits, you'll have to pay some development velocity costs.
Zig has comptime and Odin has parametric polymorphism.
C++ has generics and a good static type system, but it's more difficult to compare to C++ because it's so bogged down with artificial complexity, due to how long it's been adding new features in ways that were required to be backwards-compatible with C.
On top of that, Rust has (in my opinion) a much better build system, Cargo.
Because of these things, it's hard to compare Rust to C++ and use that as a valid comparison between their memory safety approaches, borrow checking and MMM.
Some folks think that the borrow checker's rules are inherent to programming and should be followed in any paradigm. This is false. Aliasability-xor-mutability is just a rough approximation of the real rule, dereference-xor-destroyed. Shared mutability doesn't always cause memory unsafety; dereferencing destroyed data is the real danger.
Note that these aren't data, just everyday people's experiences.
I would love to see an actual experiment measuring the actual developer velocity between different languages. I'm not sure how they would factor out the learning curve costs, but it's probably possible.
One user says, "Rust’s complexity regularly slows things down when you’re working on system design/architecture, and regularly makes things faster when you’re implementing pieces within a solid design (but if it’s not solid, it may just grind you to a total halt)."
In other words, prototyping and iteration suffer with borrow checking. It's tempting to think this is an isolated opinion, but this user is not alone; some say that it's terrible language for prototyping, it makes draft coding very difficult, and that other languages are much faster to change.
Looking closer, it's largely because the borrow checker imposes extra constraints compared to other paradigms, such as the constraint that you can't have multiple mutable references to an object. 15 When you want to make a change, you can't just do the simplest change, you need to find a change that also satisfies the extra constraints of the borrow checker.
The borrow checker also runs into some problems with decoupling. From Using Rust at a startup: A cautionary tale: 16
This is because of the borrow checker's forced coupling: as a codebase's components become more interconnected and coupled, then changes in one will require changes in another. In borrow checking, every function is much more coupled to its callers and callees by mutability constraints. This can be in direct conflict with traits, and polymorphism in general.
This also makes changing your program's data structure worse. One user says, "changing it is a nightmare because it forces you to rewrite huge swathes of code because you’ve slightly changed the way you store some data", and another user says, "In theory ... it's possible to avoid the escape hatches if you are careful to structure access to your data juuuuust right."
This makes sense, because with borrow checking our code tends to be more coupled to our data, because of the aforementioned forced coupling but also for more reasons we'll cover below, related to leaky abstractions.
There also seems to be some problems with refactoring. According to some, it can be difficult to change your program, like moving through molasses. Refactoring can be a massive pain.
Of course, this makes sense. The borrow checker requires you to satisfy its constraints up-front for every single iteration of your code, long before it actually matters. In the words of one user, it "forces me to put the cart before the horse". Most code goes through multiple iterations before it's complete, 17 so we're paying this cost more often than other paradigms.
Keep in mind, these users' experiences aren't universal, it likely depends greatly on the domain.
Stateless programs like command line tools or low-abstraction domains like embedded programming will have less friction with the borrow checker than domains with a lot of interconnected state like apps, stateful programs, or complex turn-based games.
This is probably why some people find the borrow checker to be slower to work with, while some people find it faster. 18
Universally, the more constraints and rules you add to a problem, the more rigid the solution space is.
Some interesting discussion on this quote.
Some people are brilliant and can get everything completely right on the first try, but alas, I am not one of them, and I haven't met one either.
This is also probably why online discussions about the topic tend to be so polarized.
The borrow checker may cause slower development velocity when prototyping and iteration, but it also helps in some ways if our program is using concurrency.
The borrow checker helps protect against data races. A data race is when: 19
The reading CPU will get a partial, inconsistent view of the data because the writing CPU hasn't finished writing to it yet. These bugs are very difficult to detect, because they depend on the scheduling of the threads, which is effectively random.
These bugs can take days to track down, slowing developer velocity for programs that use concurrency.
Most languages, including C, Java, and Swift, offer no protections against data races. Go offers partial protection against data races by encouraging and defaulting to message passing, but one can still suffer the occasional data race.
Rust uses the borrow checker to protect us from data races at compile time, by isolating one thread's memory from another thread's memory, and offering standard library tools that let us safely share data between threads, such as Mutex<T>.
Not all programs use (or should use) concurrency, but if your program does, the borrow checker may improve your developer velocity in these areas of your program.
From The Rustonomicon.
The borrow checker is a very effective static analysis mechanism, but it still tends to be incompatible with a lot of simple, useful, and safe patterns:
All of these are generally impossible within the rules of borrow checker. There are workarounds for some, but they have their own complexities and limitations.
When the easiest solution is one of these, then one has to spend extra time to find workarounds or an entire different approach that satisfies the borrow checker. This is artificial complexity, and can slow down development velocity. For example, I once saw someone who brought in an entire Rust framework (Yew) as a dependency because they couldn't make an observer work within the borrow checker.
Luckily, one can avoid some of these problems by leaning more heavily on workarounds like Rc or RefCell, though many in the Rust community say that these are a last resort, avoided, and should be refactored out whenever possible, but these workarounds can help avoid forced coupling and help improve developer velocity if used well.
RAII is about automatically affecting the world outside our object. To do that, the borrow checker often requires us to take a &mut parameter or return a value, but we can't change drop's signature. To see this in action, try to make a handle that automatically removes something from a central collection.
The borrow checker often influences us into a top-down architecture, which can help us maintain assumptions and invariants in our programs.
In short, a top-down architecture is where you organize your program's functions into a tree (or a directed acyclic graph), such that a parent can call a child, but a child cannot call a parent.
This is a very subtle but powerful effect for your program. Explaining it would take an entire three articles on its own, but check out this video by Brian Will where he talks about the benefits of this kind of "procedural" style. 21
GC'd languages like Pony, and functional languages like Haskell and Clojure also have this benefit. In Rust it's a natural side-effect of (or rather, requirement for) its memory safety approach.
The borrow checker also influences us toward a "flatter" organization, where all of our program's long-term state is held in central collections, reminiscent of a relational database. This leads naturally into certain architectures like ECS. This architecture works pretty well for a lot of programs.
Note that it can also be a bad fit for some programs. More complex turn-based games, such as roguelikes, are better with other architectures. ECS isn't as flexible or extensible in those situations.
There are occasional features in programming languages that are inherently leaky, in that they tend to leak through abstractions.
An example that illustrates the concept, unrelated to memory safety, is async/await.
If foo() calls an async function bar(), then we need to make foo itself async as well, and a lot of foo's callers as well. It's usually doable, unless we need to change a function signature that overrides a method on an trait. When a feature forces us to change a trait method's signature, it is an inherently leaky feature.
Leakiness can make refactors much more widespread, and can even lead to an architectural deadlock when we run into an trait method we can't change, such as one in a public API.
In a similar way, the borrow checker is also inherently leaky. For example, when our method expects a &mut reference to some data, it imposes a global constraint that nobody else has a shared reference to that data. 22 This often manifests as a &mut in our callers and all of their callers. This "caller-infectious requirement" makes borrow checking inherently leaky, and can be a particularly thorny problem when it runs into classes that have fixed interfaces, where there is no way to pass extra data through them.
I'll repeat Matt Welsh's quote, since it applies here as well:
When this occurs, one user says you can do "one of three things: 1) Refactor into an unrecognizable mess. 2) Add a lot of RwLock or RefCell. 3) Abandon the project. The issue is that to maintain the same interface, it gets to be so hacky that maintence becomes borderline impossible."
Leaky features directly conflict with the main benefits of decoupling and abstraction, which help isolate changes in one area from affecting another. Maintaining decoupling is a very important principle for development velocity, no matter what paradigm you're using.
Some people say that "abstractions are bad anyway". I'll take this opportunity to point out some of the most successful and beneficial abstractions in programming history, such as file descriptors, the UDP protocol, and the Linux operating system. 23
To sum up, the constraints we add for memory safety are sometimes in conflict with the necessary constraints of API stability and beneficial abstraction, and there's sometimes a limit to how many constraints you can add to a problem before it becomes impossible to satisfy them all. 24
Don't get me wrong, one can get sufficient developer velocity in Rust, especially compared to naive C or C++. These drawbacks are real, but they aren't crippling. Plenty of projects have been completed in a reasonable time with Rust... it's just often slower than other approaches.
It's a great video, explaining a good architecture, though I would disagree with his conclusion that object-oriented coding is therefore bad. One can easily use a top-down object-oriented architecture. In an iOS app, simply never do any mutation if calling a delegate method, only when called from above. Many React apps are architected this way, and so is Google Earth.
This is also true of the converse; a & reference puts a constraint on all other code that they don't have any &mut references.
More good programming abstractions:
And some from real life:
A quote from Harry Potter and the Methods of Rationality. Shout-out to my fellow Ravenclaws!
A lot of languages are working on borrow checking blends that are better for development velocity.
Some languages are using it under the hood:
Vale is building something similar to borrow checking but at the regions level, to largely eliminate memory safety overhead without introducing aliasing restrictions. Its opt-in nature means the user can use it where it makes sense and doesn't hinder development velocity.
Verona and Forty2 are experimenting with mixing regions and garbage collection.
Some languages are also putting borrow checkers on top of simpler and more flexible foundations:
Cone is particularly interesting because it builds a borrow checker on top of any user-specified memory management strategy.
Garbage collection is probably the best approach for developer velocity. It completely decouples your goals from the constraints of memory management. You are free to solve your problem without tracking extra requirements, such as C++'s single ownership or Rust's borrow checking.
This is a good thing, because most of your code doesn't need to care about memory management. In most programs, profiling shows only a small portion of code that's performance sensitive, requiring more precise control of memory. 25
I particularly liked this quote from the Garbage Collection Handbook:
In garbage collection, we don't have to satisfy the move-to-move constraint, or borrow-to-borrow constraint. We dont have to worry about matching pointers versus values. There's just one kind of reference, rather than Rust's 5 or C++'s 7. 26
GC also doesn't cause any refactoring to satisfy single ownership (like C++'s unique_ptr) or mutability requirements (like in Rust), because those concepts never existed to begin with. 27
There are certain errors that arise in borrow checked and MMM languages, which don't happen in GC languages.
Often, instead of holding a reference to an object like in a GC'd language, the borrow checker will force us to hold a key into a central collection, such as an ID into a hash map. If we try to "dereference" 28 the ID of an object that no longer exists, we often get a run-time error (a None or Err usually) that we have to handle or propagate.
Pony is a great example of how a garbage collected language can reach further towards correctness than other languages. Pony is one of the only languages that literally cannot cause a run-time error. Like Erlang, it is incredibly resilient. 29
Garbage collection can also be better with privacy, since objects are never accidentally reused or mixed up with each other. For example, the borrow checker can turn memory safety problems into privacy problems if one's not careful.
If one has unusually constrained latency requirements, such as in high frequency trading or a real-time first-person shooter game, it can take quite some time to refactor and tune the program to not have unwelcome latency spikes.
For example, to avoid the Java garbage collector firing in a specific scope, one has to completely avoid the new keyword in that scope, and avoid calling any functions that might sneakily use new. Coding without new in Java is a particularly arcane challenge.
It's a little easier in C#, where we can use the struct keyword to make a class without any heap allocation. High-performance Unity games sometimes use this style, but it's still rather difficult.
Garbage collection is more than fast enough for most situations. And even for those situations where it's latency spikes are too burdensome, there are solutions on the horizon.
Cone and Verona will allow us to explicitly separate GC regions from each other, such that we can create and destroy a temporary short-lived region before its first collection even needs to happen. By using regions wisely, one can probably avoid the vast majority of collections. 30
Cone aims to take that even further by blending in a borrow checker, plus allowing more allocation strategies such as arenas, reference counting, or even custom ones from the users themselves.
With these advances, we might be able to get GC's development velocity advantages without the usual performance drawbacks.
Reference counting, like in Swift, generally has the benefits that garbage collection does.
It does have one drawback: any cycle of references pointing at each other could cause a memory leak, wasting the available memory. This can be largely mitigated with good tooling that detects these cycles in development. 31
However, reference counting has three nice benefits: weak references, deterministic destruction, and the ability to make constraint references.
A weak reference is a mechanism for determining if the pointed-at object is still alive. This can be useful for a program's logic. For example, a Rocket might check if the target Spaceship has already been destroyed, to know whether it should safely fall into the planet's ocean.
Reference-counted objects are destroyed deterministically, which helps us have finer control over our program's performance.
One can also specify where they expect a reference-counted object to be destroyed, simply by asserting that they have the last reference (in other words, asserting the reference count is 1 at the end of the scope). I call this a constraint reference 32 and it can help us detect our program's logic bugs in a way that no other paradigm can.
MMM, borrow checking, GC, and RC each have their strengths and weaknesses. However, in the dimension of development velocity, my general conclusions would be:
Of course, a language's memory safety approach is only one factor in its development velocity. Development velocity can be helped or hindered in other ways.
Opinions may differ, but here's some examples:
Weighing all these factors together is worthy of an entire book, but here's some rough guidelines:
There are also a lot of other languages improving, blending, and even creating new memory safety paradigms.
I hope this post has given you a broader perspective on how various memory safety approaches affect development velocity!
If you're interested in this kind of thing, then check out Part 1 which talks about memory safety in unsafe languages, and keep an eye out for the next parts on our RSS feed, twitter, discord server, or subreddit!
This is also why C# offers value types (struct) for more precise control over memory.
In C++, there's Ship, Ship&, Ship*, const Ship*, const Ship&, unique_ptr<Ship>, shared_ptr<Ship>. In Rust, there's Ship, &Ship, &mut Ship, Box<Ship>, Rc<Ship>. Cell<T> and RefCell<T> might also count, bringing Rust to 7 too, perhaps.
Though I wouldn't turn down some sort of single ownership / RAII being added to a GC'd language. It would prevent us ever forgetting to call .dispose() on a class's child.
By this I mean use the index or ID to look up an object in the central hash map.
Note that not all garbage collected languages strive for correctness... most of them have some flavor of null or nil.
I suspected Pony could do something similar, but after a quick discussion with one of the developers, it turns out not to be the case.
Nim has a particularly fascinating approach here: it only runs its cycle collector for types that could possibly be cyclic. This helps ensure faster deterministic cleanup, and lets us use destructors too.
Vale used to be based on these constraint references, check out this article from long ago for more.
If you're writing a small program, not working on a team, and the program terminates fairly quickly, then static typing can sometimes be a needless step. For example, small CLI scripts or integration test scripts.