I recently wrote an article on how a language can improve on Rust's and Pony's concurrency, and one of my readers asked:
Note that we're not talking about general race conditions, just data races. A data race is when: 0
But in Python, every thread is synchronized, because of the Global Interpreter Lock. So it can't have data races, right?
Data races only happen in parallel code, not concurrent code, right? 1
From the Rustonomicon.
In Python, since only one thread can ever run at a time, it's concurrent. Concurrency is when we're doing multiple tasks at once, but only progressing one task at a time. Parallelism is when we e.g. use multiple cores to progress multiple threads at a time.
At first I thought that, although we need mutexes to avoid race conditions, the GIL makes Python immune to mere data races.
However, that didn't seem right. I did a little bit of experimenting, and made this program:
from threading import Thread
from time import sleep
counter = 0
def increase():
global counter
for i in range(0, 100000):
counter = counter + 1
threads = []
for i in range(0, 400):
threads.append(Thread(target=increase))
for thread in threads:
thread.start()
for thread in threads:
thread.join()
print(f'Final counter: {counter}')
Final counter: 31735072
Surprisingly, we didn't get 40000000, we got 31735072. 2
This program's data race happens in the counter = counter + 1 line. Let's pretend there are only two threads, and the program just started.
As you can see, even though both threads incremented counter, it's not 2, it's 1. This is a data race in action.
This was on a 2.6 GHz 6-Core i7 Mac, for your computer to show a data race you may need to change the 400 or 100000.
I say "register", but CPython doesn't actually use registers directly; it load variables onto stack frames that are stored on the heap, according to this post.
This was actually pretty hard to discover. The first few experiments failed, because Python is pretty smart about when it runs each thread.
Python will only interrupt a thread if it's taking too long. From anekix's StackOverflow answer:
So, it seems a thread needs to last longer than 5 milliseconds to possibly trigger a data race. This explains why we had to do 100,000 iterations in each thread.
This policy makes it difficult to identify when one has data races. However, it probably also reduces their frequency in production, which is a nice benefit.
As a language designer, I wonder if there was a missed opportunity in here.
Go's map iteration order is random, in part because it prevents us from accidentally relying on iteration order. We could take some inspiration from that.
I wonder if, in development mode, Python could use a shorter interval so that we notice any data races hiding in our code, and in release mode, could use this longer (5ms) interval. 4
This also relates to one's philosophy on determinism. We know that:
These would seem to be incompatible, but a core goal of Vale is to make races more obvious, and reproduce them easily.
We could do this with universal deterministic replayability, where in development mode we record all non-deterministic inputs, such as command line arguments, stdin, sockets, files, etc. 6 7, plus the scheduling of all inter-thread messages and mutex lockings. This might seem difficult, but it's possible for a language to guarantee determinism, for example if it uses a region-based borrow checker, and eliminates unsafe blocks and undefined behavior.
With that, every run would be random but recorded, and when we do encounter a race, we could reproduce the race trivially by replaying the recording.
Someday, perhaps problems like these will be a thing of the past!
We could do this today, by using sys.setswitchinterval()!
A "heisenbug" is when we encounter a bug, but when investigating, have difficulty making it happen again, because it's based on some non-deterministic factor.
More specifically, it would just records any inputs from FFI into a file.
This feature is about halfway complete in Vale: the FFI semantics and FFI serialization are ready, but we don't record to a file yet.
TL;DR: Python threads can have data races!
This might all sounds pretty obvious to Python veterans, but it was surprising to me. Maybe it will be surprising to others!
Thanks for reading! If you want to discuss more, come by the r/Vale subreddit or the Vale discord.
- Evan O