Until recent events, I've firmly believed the most terrifying thing known to mankind is the kraken.
The kraken is a colossal eldritch sea horror which resembles a giant squid. It's known to crush entire ships.
They say that 60% of all ships lost at sea fell victim to krakens, probably. 0
If someone says they aren't scared of krakens, they've obviously never met one.
However, something else has started appearing in my kraken nightmares: hash codes. Not all hash codes, but a specific kind of hash code that's more insidious than colossal eldritch sea horrors.
This is my tale, where I discovered a terrifying truth underlying our world, and the reason why krakens and hash codes would appear in the same nightmare.
For the last several months, I've been working on Vale's support for regions, which involves adding a sort of "opt-in borrow checker" to the language. Since it's opt-in and not upwardly infectious, it should pair well with our generational references to give us speed and safety while still keeping the language easy to use. As someone who mainly uses C++ and Rust, this makes me very excited.
A couple weeks before finishing the generics transition, I encountered something unusual. I had fixed one of our tests, code-named Delirious Blue, but then after I fixed another test, Delirious Blue started failing again. 2
I ran it again, with the debugger running, and it passed.
I ran it again, and it failed. Then passed. Then passed. Failed. Passed, failed, failed, passed, failed, failed.
I knew, in that moment, that non-determinism had crept into the compiler somehow.
Every time you run a program, it's supposed to do the same thing. Programs are predictable. If you run echo hello five times in a row, it's supposed to be deterministic; it's supposed to show the same thing every time.
For example, imagine if echo just decided to act differently every once in a while: 3
When a program isn't predictable, debugging becomes difficult, birds fall from the sky, and our most precious illusions fade away and leave us in an uncertain, lovecraftian world filled with terrors.
This is non-determinism. It's when a function or program gives unexpectedly different results even when we give it the same inputs.
Every time I ran the compiler, I got a different result. It's like it would randomly decide which data to process first, so I couldn't reliably reproduce the problem I was trying to solve. I spent hours on this, with no luck.
"Something's non-deterministic! It's driving me insane!", I said to the discord server.
"But how? Isn't the frontend written in Scala? The JVM is deterministic." someone said.
"It should be deterministic," I said, "it's garbage collected after all..."
Then I remembered the winter of '19. It was the 7 Day Roguelike Challenge, when developers worldwide spend 7 days each making a roguelike game. It was day five, we'd already lost half of our comrades to bugs, and everyone was in their final days.
I had fired up my program for the fifteenth time that day, but things were suddenly acting weird: when I ran the same program with the same inputs, I got different behavior every time. This was a problem, because it was a game where the user could save, reload, and travel back in time, expecting things to behave the same way as they did last time. Mastering these techniques was very important for defeating the final boss, a Kraken.
After spending most of the day diving and debugging, I'd figured out that it was because I was using strings in a new place, and C#'s string.GetHashCode() returns something different every run.
Needless to say, non-determinism and I have a history.
"But that can't happen in Scala..." I mused to myself, "it always provides a pretty sane default hashCode implementation."
"I think there's a couple classes that just return the object's address." someone said.
"That's impossible!" I said. "The JVM can't return an address because it literally moves objects around during compaction!"
It turns out, they were right. Whereas Scala's Vector, List, etc. will all calculate their hash codes based on their elements, Scala's Array doesn't. It inherits the hashCode function from java.util.Object which might just return the address of the object... which of course isn't deterministic between runs.
But it still needs to return the same hash code for the entire duration of the program, right? To do that, the JVM literally stores the object's original address so it can always return it. How crazy is that?
So I had a couple options:
I have some religious objections to the first option, so I went with the second for now.
Suddenly, the compiler was completely deterministic again.
I breathed a sigh of relief. Bullet dodged!
I believe that non-determinism is the bane of humanity. It's nearly impossible to determine where a program's non-determinism comes from. One little bit of non-determinism can snake through your entire codebase, like the tentacles of some eldritch sea-horror.
I was extremely lucky that I'd faced this problem before, and that someone in my server knew that you could actually observe an object's address in the JVM.
Luckily, there's hope on the horizon. At some point, we'll be able to rewrite the Vale compiler in Vale itself, and Vale's perfect replayability feature guarantees that we can perfectly reproduce any bug we find.
I hope you enjoyed this tale!
With your support, we can slay non-determinism and other eldritch sea horrors across the world!
See you next time!
- Evan Ovadia
I probably didn't make this up.
Mostly because regions require that we monomorphize any read-only region into both an immutable and mutable region, so that we can better avoid shared mutability restrictions.
Delirious Blue is the name of a certain drink served in a certain obscure part of the western hemisphere. Why did we name a test case after that? The answer is a little too scandalous for an article. Feel free to ask me offline!
It's almost as scary as echo printing "is he gone yet oh shoot uh hello". Even scarier than that time I saw a 2 in a binary file.
Vale aims to bring a new way of programming into the world that offers speed, safety, and ease of use.
The world needs something like this! Currently, most programming language work is in:
These are useful, but there is a vast field of possibilities in between, waiting to be explored!
Our aim is to explore that space, discover what it has to offer, and make speed and safety easier than ever before.
In this quest, we've discovered and implemented a lot of new techniques:
These techniques have also opened up some new emergent possibilities, which we hope to implement:
We also gain a lot of inspiration from other languages, and are finding new ways to combine their techniques:
...plus a lot more interesting ideas to explore!
The Vale programming language is a novel combination of ideas from the research world and original innovations. Our goal is to publish our techniques, even the ones that couldn't fit in Vale, so that the world as a whole can benefit from our work here, not just those who use Vale.
Our medium-term goals:
We aim to publish articles biweekly on all of these topics, and create and inspire the next generation of fast, safe, and easy programming languages.
If you want to support our work, please consider sponsoring us on GitHub!
With enough sponsorship, we can: