LanguagesArchitecture
Note: I wrote this article with an unusual (and partially incorrect) definition of data race. To get the most out of this article, one could interpret "data race" as "race condition". The greater theme of this article is unchanged: that a language (such as Javascript or most flavors of Python) cannot avoid concurrency problems by making everything single-threaded under the hood. Happy reading!

I recently wrote an article on how a language can improve on Rust's and Pony's concurrency, and one of my readers asked:

"Does Python have fearless concurrency? The Global Interpreter Lock makes only one thread run at a time, so I'd assume there's no data races."

Note that we're not talking about general race conditions, just data races. A data race is when: 0

  • Two or more threads concurrently accessing a location of memory.
  • One or more of them is a write.
  • One or more of them is unsynchronized.

But in Python, every thread is synchronized, because of the Global Interpreter Lock. So it can't have data races, right?

Data races only happen in parallel code, not concurrent code, right? 1

Side Notes
(interesting tangential thoughts)
0

From the Rustonomicon.

1

In Python, since only one thread can ever run at a time, it's concurrent. Concurrency is when we're doing multiple tasks at once, but only progressing one task at a time. Parallelism is when we e.g. use multiple cores to progress multiple threads at a time.

A Python Data Race

At first I thought that, although we need mutexes to avoid race conditions, the GIL makes Python immune to mere data races.

However, that didn't seem right. I did a little bit of experimenting, and made this program:

python
from threading import Thread
from time import sleep

counter = 0

def increase():
    global counter
    for i in range(0, 100000):
        counter = counter + 1

threads = []
for i in range(0, 400):
    threads.append(Thread(target=increase))
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

print(f'Final counter: {counter}')
stdout
Final counter: 31735072

Surprisingly, we didn't get 40000000, we got 31735072. 2

This program's data race happens in the counter = counter + 1 line. Let's pretend there are only two threads, and the program just started.

  • Thread A loads 0 from counter into register* X. 3
  • Thread B loads 0 from counter into register Y.
  • Thread A adds 1 to register X, it now contains 1.
  • Thread B adds 1 to register Y, it now contains 1.
  • Thread A stores register X into counter, it now contains 1.
  • Thread B stores register Y into counter, it now contains 1.

As you can see, even though both threads incremented counter, it's not 2, it's 1. This is a data race in action.

2

This was on a 2.6 GHz 6-Core i7 Mac, for your computer to show a data race you may need to change the 400 or 100000.

3

I say "register", but CPython doesn't actually use registers directly; it load variables onto stack frames that are stored on the heap, according to this post.

The Intrigue

This was actually pretty hard to discover. The first few experiments failed, because Python is pretty smart about when it runs each thread.

Python will only interrupt a thread if it's taking too long. From anekix's StackOverflow answer:

In new versions, instead of using instruction count as a metric to switch threads, a configurable time interval is used. The default switch interval is 5 milliseconds.

So, it seems a thread needs to last longer than 5 milliseconds to possibly trigger a data race. This explains why we had to do 100,000 iterations in each thread.

This policy makes it difficult to identify when one has data races. However, it probably also reduces their frequency in production, which is a nice benefit.

Thoughts on Detecting Races

As a language designer, I wonder if there was a missed opportunity in here.

Go's map iteration order is random, in part because it prevents us from accidentally relying on iteration order. We could take some inspiration from that.

I wonder if, in development mode, Python could use a shorter interval so that we notice any data races hiding in our code, and in release mode, could use this longer (5ms) interval. 4

This also relates to one's philosophy on determinism. We know that:

  • Determinism helps a lot for reproducing bugs. Nobody likes heisenbugs. 5
  • Relying on non-guaranteed determinism can cause some bugs. For example, if we accidentally depend on a hash map's ordering in 100 places, and then change our hash map's algorithm, we suddenly have 100 bugs.

These would seem to be incompatible, but a core goal of Vale is to make races more obvious, and reproduce them easily.

We could do this with universal deterministic replayability, where in development mode we record all non-deterministic inputs, such as command line arguments, stdin, sockets, files, etc. 6 7, plus the scheduling of all inter-thread messages and mutex lockings. This might seem difficult, but it's possible for a language to guarantee determinism, for example if it uses a region-based borrow checker, and eliminates unsafe blocks and undefined behavior.

With that, every run would be random but recorded, and when we do encounter a race, we could reproduce the race trivially by replaying the recording.

Someday, perhaps problems like these will be a thing of the past!

4

We could do this today, by using sys.setswitchinterval()!

5

A "heisenbug" is when we encounter a bug, but when investigating, have difficulty making it happen again, because it's based on some non-deterministic factor.

6

More specifically, it would just records any inputs from FFI into a file.

7

This feature is about halfway complete in Vale: the FFI semantics and FFI serialization are ready, but we don't record to a file yet.

Conclusion

TL;DR: Python threads can have data races!

This might all sounds pretty obvious to Python veterans, but it was surprising to me. Maybe it will be surprising to others!

Thanks for reading! If you want to discuss more, come by the r/Vale subreddit or the Vale discord.

- Evan O