How To Survive Your Project's First 100,000 Lines

May 2, 2023 — Evan Ovadia

After many years of development, the Vale compiler just hit its 100,000th line of code.

This is an article about how we kept it from collapsing under its own weight and exploding, as many projects do.

Some of these software engineering techniques came from my time at Google, though ironically most came from my work on the Vale compiler and game development 0 so some of these might be surprising to my engineer comrades out there.

These techniques range from determinism, to testing, to type-system techniques, to general architectural best-practices.

And since I'm a language geek, I'll throw in some nonsense about languages!

If you think you're using enough assertions, you're wrong

I'm not exaggerating when I say assertions are the greatest thing since sliced arrays, and have cut my debugging time by half.

In your darkest hour, the assertion arrives.

Rule of thumb: Every time you think something is true about your data, and it's not ensured by the type system, add an assertion that checks it.

Don't just occasionally use them, drench your code in them. When your code so much as sneezes, they should shake off like sand after the beach. This is the way.

We once had a bug in the post-parser which caused some data corruption, and thank goodness it was caught by an assertion in the final stage of the compiler. 1 If that assertion didn't catch it, it would have made it into the LLVM-generated code, and would have taken hours to track down.

Such miracles are why the Vale compiler has an entire 1,795 assertions.

Side Notes

(interesting tangential thoughts)

Notes [–] Notes [+] 0 1

Though I did later apply a lot of these architecture techniques to Google Earth.

Previously known as the Hammer stage, though now it has the more boring name of FinalAstSimplifier.

Lean on the type system

Even if you're using a statically typed language, you can still make youre code more statically typed.

For example, a string can serve many purposes: an ID, a first name, a URL, and so on. Sometimes, we can mix up which strings are fulfilling which purposes. When it gets real bad, we sometimes call our code "stringly typed".

When you find yourself accidentally passing an ID string argument into a first name string parameter, it's a hint that you shouldn't be passing around strings, and perhaps you should wrap them in different kinds of structs. This is sometimes called the New Type pattern.

And when you find yourself using strings to represent multiple pieces of data (perhaps split by a special character like :), break it apart into a struct with multiple fields instead.

Any statically-typed language has this ability, make sure to lean on it!

Languages like Vale, Austral, and Rust are particularly strong in this regard because of their type-state programming 2 and especially Vale's Higher RAII. 3

A better way to comment

In 2019, I came across some of our code for generational references. 4

"That's odd," I said, "This really should be simpler," and I started refactoring.

However, I quickly ran into a wall, and discovered why our code looked odd: it was actually handling some pretty complicated requirements that required that odd approach.

In 2021, I came across the same code.

"That's odd," I said, "This really should be simpler," and I started refactoring.

Then I proceeded to run into that same wall. It was only then that I remembered my first attempt.

This is why we leave comments!

However, comments can become out-of-date, and you might not be lucky enough to stumble across the right comment before you embark on a refactoring adventure.

For this reason, we also scatter clues around, like the below see SAIRFU.

// Here we used to check that the rule's runes were solved, but
// we don't do that anymore because some rules leave their runes
// as mysteries, see SAIRFU.

"See (acronym)" means to search our internal docs for more explanations and discussion. In this case, the documentation's SAIRFU section says:

# Send and Impl Rules For Upcasts (SAIRFU)

For this snippet:
  a MyInterface = MyStruct(1337);

Let's say a's coord rune is X. We can't say:
  X = MyInterface
  X = MyStruct
because that's a conflict.

We could just not feed the X = MyStruct into the solver, but then this breaks:
  a = MyStruct(1337);
because it can't figure out what X should be.

...

This is good for many reasons:

We can paste "Some rules leave their runes as mysteries, see SAIRFU." into many comments across our codebase, anywhere that's relevant. This is much better than copying an entire lengthy explanation to a dozen places in the code.
If we later decide SAIRFU wasn't wise, we can search the codebase for SAIRFU to find all of the places we need to update.

TL;DR: Have many links to centralized documentation.

Avoid nondeterminism

Have you ever had a bug that you could only reproduce one out of every ten tries? What an adventure!

And then there's the bugs that you might only reproduce every hundredth try. A harrowing endeavor. You try and you try, and eventually give up and close the bug report as "Cannot Reproduce".

This unpredictability is caused by non-determinism, in other words, when there are some random factors that make it hard to predict whether something will happen. For example:

The bug only happens when a random number generator produces an odd number.
The bug only happens when thread A happens to access data before thread B.
The bug only happens when network response B comes before network response A.
The bug only happens when your string's .GetHashCode() produces an odd number. 5
The bug only happens when item X appears before item Y in a hash map in Golang or Rust.

The biggest case of this in the Vale compiler was when Scala's Arrays were hashing nondeterministically. I wrote about it more in Hash Codes, Non-Determinism, and Other Eldritch Horrors, so if you're into horror stories, check it out.

If non-determinism creeps into your program, you'll be testing your application, find a bug, and then never be able to reproduce it. Avoid it when you can!

Notes [–] Notes [+] 2 3 4 5

Type-state programming is a way for the compiler to ensure that a particular class doesn't fall into an invalid state.

Higher RAII will make sure you never forget to call a particular function.

Generational references are an alternative mechanism for memory safety, like reference counting, garbage collection, or borrow checking.

I was shocked when C#'s string.GetHashCode broke my determinism in the 2020 7DRL Challenge.

...except you can't really avoid nondeterminism, so how do we solve it?

Even in a compiler, which needs no network requests or animations, 6 we still have plenty of nondeterminism from asynchronous file IO and thread scheduling.

So how might a language let us reproduce bugs, even in the presence of nondeterminism?

The answer is something I like to call perfect replayability. We've prototyped it in Vale, and it's working pretty well!

In perfect replayability, the compiler will instrument 7 the code to:

Record any sources of non-determinism into a file. For example:

Record all responses you get from the network.
Record the seed number you used to initialize your random number generator.
Record all inputs from the user.

Have a "replay mode" where you read from that file instead. For example:

Instead of requesting to the network, read from that file.
Instead of initializing your random number generator with the current timestamp, initialize it with the number you previously wrote to the file.
Instead of reading input from the user, read what you previously wrote to the file.

While you're developing or testing, your program records to these files. When you find a bug, fire up replay mode, and enjoy the time you saved!

In the 2020 7DRL Challenge, on the penultimate evening, I launched thousands of games automatically played by a "random AI" player. Three of them crashed, and I was able to fix the bugs because I could reproduce those crashes with deterministic replayability.

Notes [–] Notes [+] 6 7

Hold my beer. A compiler... with animations. Best error messages ever! Who's with me?

Instrumentation is where the compiler emits additional instructions to serve a particular side-goal, such as debuggability, observability, or replayability like we see here.

Stay on top of your testing

In a small project, only a few thousand lines, it's easy to fix a bug without causing any more bugs because you know the system in and out.

Once you start approaching 10,000 lines, fixing one bug will often cause multiple other bugs. Not just easy bugs, but obscure bugs that your users find six months later.

However, with tests, you'll know instantly whether your fix caused any other bugs. You can then try a better fix.

If you don't have a vast suite of tests, your project might get slower and slower until changing anything feels like pulling teeth. 8

Some languages are easier to test with. Javascript's Monkey Patching is a wonderful alternative to mocking, and can make testing much easier. 9

Prefer end-to-end tests

In the early days of the Vale compiler, we had a lot of unit tests for our various components.

For those unfamiliar, a unit test is one that specifically tests just one piece of code. You craft some inputs, feed those into your code, and check the outputs.

Unit tests are nice because they tell you exactly where the bug is, because they test only a small piece of your code.

However, the data being passed between these components was changing very often, because the project was evolving rapidly in response to user feedback and experiments.

And unfortunately, every time this happened, we had to change the unit tests. Quite irksome!

Instead, we've switched over to end-to-end tests. An end-to-end test is where a script will open up your application and click on buttons and type inputs in the right sequence to indirectly run some specific code in your program.

For the Vale compiler, it means we run the compiler with some Vale source code, then run it, and make sure it produces the right output. As of this writing, the Vale compiler has 1,308 end-to-end tests.

Some caveats on this advice:

It not apply to larger endeavors, only ones that have rapidly shifting internals, as new projects often do. As a project becomes larger, other priorities dominate.
If there's a piece of the program whose code changes often but the inputs and outputs are relatively stable, unit tests are still a good choice.

Know when to add a test

It's good practice to add a test whenever you stumble upon a bug.

However, let's take that advice one step further.

Whenever you have a test that discovers a bug, ask yourself, "could a more specific test have caught this too?" and then add that more specific test.

This approach has a hidden benefit. If you're refactoring a nearby area of your code and you break this functionality, you now have a much more specific test failure to tell you what exactly is going wrong.

Prioritize development velocity

If you're not careful, your development speed can slow to a crawl. Here are some ways to keep yourself nimble:

Use a language with good compile speeds. If you go doomscrolling on reddit while you're building, you know your compile times are too long.

Use a memory-safe language. Memory safety doesn't just help with security, it helps you avoid bugs that are very difficult to diagnose.

Prioritize looser coupling. If you have to change code way over there to accommodate a feature way over here, take a step back.

Find a way to harness object-oriented benefits:

Dependency injection (the pattern, not the kind of framework)
Encapsulation
Polymorphism

...without incurring object-oriented drawbacks (implementation inheritance's brittleness).

Use a flexible language. The best languages let you focus on the problem you're trying to solve, rather than the constraints of the language which don't make much sense for your use cases.

For example, if you're making a turn-based roguelike game, C# or Typescript could be better choices than Haskell or Rust, whose extra constraints might cause extra refactoring and complexity. 10

Statically-typed, garbage collected languages like Java 11 can sometimes be the best in this regard. They may not be flashy but they're flexible, much more multi-paradigm, and have good compile speeds.

Bonus points to languages like Scala that let you temporarily "turn off" the type system via Nothing, so you can work on your feature now and fix the types for unrelated code afterward.

Bonus: Sanity Checking

Let's take assertions to the next level!

Let's say you have a id_to_account HashMap<ID, UserAccount>. Unfortunately, to find a user by name, you have to loop through the entire map, because it's keyed by ID, not by name.

So then you add a separate name_to_account HashMap<str, UserAccount>, and you try to keep these two maps in sync. However, if you accidentally remove an account from only one, you now have a data inconsistency.

After you add your normal assertions, also consider periodically calling a sanityCheck function:

func sanityCheck(
    id_to_account &HashMap<ID, UserAccount>,
    name_to_account &HashMap<str, UserAccount>) {
  assert(id_to_account.size() == name_to_account_map.size());
  assert(name_to_account.map(x => x.id) == id_to_account.keys());
  // ... and even more assertions!
}

In one case, we have an 80-line sanity check function to check that all the state in the generics solver is consistent.

When I worked on Earth, I made a 200-line sanityCheck function run before and after every click, which made sure the application was in a sane state. It saved countless hours of debugging.

Lean hard on this technique, it will serve you well.

That's all!

Thanks for visiting, and I wish you the best of luck in your first 100,000 lines!

In the coming weeks, I'll be posting the next article in the Implementing a New Memory Safety Approach series, so subscribe to our RSS feed, twitter, or the r/Vale subreddit, and come hang out in the Vale discord.

Notes [–] Notes [+] 8 9 10 11

There are exceptions to this. In game development, requirements change so much, that even end-to-end tests can have a negative return-on-investment. Other methods of testing are preferred, such as setting two random AI players against each other. Be sure to save the random seed!

Part of me wonders if we can get a statically typed language to do the same thing... Java's newProxyInstance is particularly interesting here.

This often depends on the domain, I've found. CLI apps, stateless servers, and smaller projects can much more easily work with functional programming and borrow checking. Games, apps, or stateful programs clash with these paradigms more.

And this is coming from me, who generally doesn't prefer garbage collection!