LanguagesArchitecture

Until recent events, I've firmly believed the most terrifying thing known to mankind is the kraken.

The kraken is a colossal eldritch sea horror which resembles a giant squid. It's known to crush entire ships.

They say that 60% of all ships lost at sea fell victim to krakens, probably. 0

If someone says they aren't scared of krakens, they've obviously never met one.

However, something else has started appearing in my kraken nightmares: hash codes. Not all hash codes, but a specific kind of hash code that's more insidious than colossal eldritch sea horrors.

This is my tale, where I discovered a terrifying truth underlying our world, and the reason why krakens and hash codes would appear in the same nightmare.

A Worthy Goal

For the last several months, I've been working on Vale's support for regions, which involves adding a sort of "opt-in borrow checker" to the language. Since it's opt-in and not upwardly infectious, it should pair well with our generational references to give us speed and safety while still keeping the language easy to use. As someone who mainly uses C++ and Rust, this makes me very excited.

This meant adding more functionality to our type system and function resolution, and transitioning from templates to full generics. 1

A couple weeks before finishing the generics transition, I encountered something unusual. I had fixed one of our tests, code-named Delirious Blue, but then after I fixed another test, Delirious Blue started failing again. 2

I ran it again, with the debugger running, and it passed.

Oh no.

I ran it again, and it failed. Then passed. Then passed. Failed. Passed, failed, failed, passed, failed, failed.

No!

I knew, in that moment, that non-determinism had crept into the compiler somehow.

Non-Determinism

Every time you run a program, it's supposed to do the same thing. Programs are predictable. If you run echo hello five times in a row, it's supposed to be deterministic; it's supposed to show the same thing every time.

For example, imagine if echo just decided to act differently every once in a while: 3

$ echo hello
hello
$ echo hello
hello
$ echo hello
hello
$ echo hello
hello
$ echo hello
howdy

When a program isn't predictable, debugging becomes difficult, birds fall from the sky, and our most precious illusions fade away and leave us in an uncertain, lovecraftian world filled with terrors.

This is non-determinism. It's when a function or program gives unexpectedly different results even when we give it the same inputs.

The Investigation

Every time I ran the compiler, I got a different result. It's like it would randomly decide which data to process first, so I couldn't reliably reproduce the problem I was trying to solve. I spent hours on this, with no luck.

"Something's non-deterministic! It's driving me insane!", I said to the discord server.

"But how? Isn't the frontend written in Scala? The JVM is deterministic." someone said.

"It should be deterministic," I said, "it's garbage collected after all..."

Then I remembered the winter of '19. It was the 7 Day Roguelike Challenge, when developers worldwide spend 7 days each making a roguelike game. It was day five, we'd already lost half of our comrades to bugs, and everyone was in their final days.

I had fired up my program for the fifteenth time that day, but things were suddenly acting weird: when I ran the same program with the same inputs, I got different behavior every time. This was a problem, because it was a game where the user could save, reload, and travel back in time, expecting things to behave the same way as they did last time. Mastering these techniques was very important for defeating the final boss, a Kraken.

After spending most of the day diving and debugging, I'd figured out that it was because I was using strings in a new place, and C#'s string.GetHashCode() returns something different every run.

Needless to say, non-determinism and I have a history.

"But that can't happen in Scala..." I mused to myself, "it always provides a pretty sane default hashCode implementation."

"I think there's a couple classes that just return the object's address." someone said.

"That's impossible!" I said. "The JVM can't return an address because it literally moves objects around during compaction!"

The Culprit

It turns out, they were right. Whereas Scala's Vector, List, etc. will all calculate their hash codes based on their elements, Scala's Array doesn't. It inherits the hashCode function from java.util.Object which might just return the address of the object... which of course isn't deterministic between runs.

But it still needs to return the same hash code for the entire duration of the program, right? To do that, the JVM literally stores the object's original address so it can always return it. How crazy is that?

So I had a couple options:

  1. Make sure to never observe the ordering of a hash map's elements, such as by calling HashMap.head or HashMap.iterator.
  2. Stop using Array altogether, and use Scala's Vector instead.

I have some religious objections to the first option, so I went with the second for now.

Suddenly, the compiler was completely deterministic again.

I breathed a sigh of relief. Bullet dodged!

Retrospect

I believe that non-determinism is the bane of humanity. It's nearly impossible to determine where a program's non-determinism comes from. One little bit of non-determinism can snake through your entire codebase, like the tentacles of some eldritch sea-horror.

I was extremely lucky that I'd faced this problem before, and that someone in my server knew that you could actually observe an object's address in the JVM.

Luckily, there's hope on the horizon. At some point, we'll be able to rewrite the Vale compiler in Vale itself, and Vale's perfect replayability feature guarantees that we can perfectly reproduce any bug we find.

I hope you enjoyed this tale!

In the next post, I'll talk about implementing regions to make a sort of "opt-in borrow checker" for Vale, so keep an eye out on our RSS feed, twitter, discord server, or subreddit!

If you're impressed with our track record and believe in the direction we're heading, please consider sponsoring us on GitHub!

With your support, we can slay non-determinism and other eldritch sea horrors across the world!

See you next time!

- Evan Ovadia

Side Notes
(interesting tangential thoughts)
0

I probably didn't make this up.

1

Mostly because regions require that we monomorphize any read-only region into both an immutable and mutable region, so that we can better avoid shared mutability restrictions.

2

Delirious Blue is the name of a certain drink served in a certain obscure part of the western hemisphere. Why did we name a test case after that? The answer is a little too scandalous for an article. Feel free to ask me offline!

3

It's almost as scary as echo printing "is he gone yet oh shoot uh hello". Even scarier than that time I saw a 2 in a binary file.

We're looking for sponsors!

With your help, we can launch a language with speed, safety, flexibility, and ease of use.

We’re a very small team of passionate individuals, working on this on our own and not backed by any corporation.

If you want to support our work, please consider sponsoring us on GitHub!

Those who sponsor us also get extra benefits, including:

  • Early access to all of our articles!
  • A sneak peek at some of our more ambitious designs, such as memory-safe allocators based on algebraic effects, an async/await/goroutine hybrid that works without data coloring or function coloring, and more.
  • Your name on the vale.dev home page!

With enough sponsorship, we can:

  • Start a a 501(c)(3) non-profit organization to hold ownership of Vale. 4
  • Buy the necessary computers to support more architectures.
  • Work on this full-time.
  • Make Vale into a production-ready language, and push it into the mainstream!

We have a strong track record, and during this quest we've discovered and implemented a lot of completely new techniques:

  • The Linear-Aliasing Model that lets us use linear types where we need speed, and generational references where we need the flexibility of shared mutability.
  • Region Borrowing, which makes it easier to write efficient code by composing shared mutability with the ability to temporarily freeze data.
  • Higher RAII, where the language adds logic safety by enforcing that we eventually perform a specific future operation.
  • Perfect Replayability makes debugging race conditions obsolete by recording all inputs and replaying execution exactly.

These have been successfully prototyped. With your sponsorship we can polish them, integrate them, and bring these techniques into the mainstream. 5

Our next steps are focused on making Vale more user-friendly by:

  1. Finalizing the compiler's error messages and improving compile speeds.
  2. Polishing interop with other languages.
  3. Growing the standard library and ecosystem!

We aim to combine and add to the benefits of our favorite languages:

We need your help to make this happen!

If you're impressed by our track record and believe in the direction we're heading, please consider sponsoring us:

If you have any questions, always feel free to reach out via email, twitter, discord, or the subreddit. Cheers!

4

Tentatively named the Vale Software Foundation.

5

Generational references, the linear-aliasing model, and higher RAII are all complete, and region borrowing, fearless FFI, and perfect replayability have been successfully prototyped. Be sure to check out the experimental version of the compiler!