Fixing the Git Blame Beachball

May 6th, 2024

We had just finished making the change, I ran git commit, and immediately Zed froze, started to beachball…, and 10 seconds later it was back.

I tried again: make a change, git commit, see a beachball. Uh Oh.

Zed is supposed to be fast. What's going on?

We pulled open Instruments and ran a CPU profile:

macOS Instruments showing a Severe Hang, and no CPU usage.
macOS Instruments showing a Severe Hang, and no CPU usage.

Hmm... our process is doing... nothing...? That doesn't make sense. At least Instruments agrees that this is a "Severe Hang".

Mikayla did some searching and found this post, that suggested we try the "Thread State" instrument:

Thread State instruments showing the main thread is blocked.
Thread State instruments showing the main thread is blocked.

Well, that's interesting, the main thread is indeed blocked (and indeed so are almost all our backgrount threads). But why?

Stumbling across the "OS Fundamentals" sub-instrument in the drop-down on the left led us to the answer.

OS Fundamentals sub-instrument showing a psynch_cvwait syscall
OS Fundamentals sub-instrument showing a psynch_cvwait syscall

It's stuck in a syscall psynch_cvwait. That's a condition variable wait, but what's it waiting for? None of our background threads seem to be doing anything either...

Looking at the stack-trace by clicking the bottom-most right-most icon gives us a glimmer of a clue, but more of a "that shouldn't happen".

Backtrace from the main thread
Backtrace from the main thread

It looks like the main thread is blocking deliberately, in a block_with_timeout. We do this in a few places to ensure that if we can do a user-visible task quickly the result shows up on the next frame. That said the timeout is set to 5ms so that if the task takes longer than expected we don't block the UI thread. In this case we blocked for 4550ms which is, checks notes, about 1000 times too long.

Pushing that question to the side: Why isn't the main thread being woken up after 5ms? We continued clicking around in Instruments.

Looking at the Narrative view in the bottom right, we get our first real hint:

The Narrative around the time the thread was unblocked
The Narrative around the time the thread was unblocked

"The thread was made runnable by git". Interesting. Given that I was triggering this with a git commit it wasn't surprising that git was running, but it was surprising that it'd be blocking Zed.

Filtering the Narrative view by git reveals something even more interesting. It looks like almost every time our app is woken, it's being woken by git, and not just that, it's a different git process each time. We have 257 git processes running?!

A large number of git process-swaps in the Narrative
A large number of git process-swaps in the Narrative

The most recent feature we added in this space was Inline Git Blame, so that was the prime suspect. Sure enough, disabling that eliminated the problem.

From there it was relatively easy to track down. When we'd added the git blame code it was to power the gutter, and so the assumption was that it would be open in one or two files at a time. We set up a bunch of event listeners so that if the git index changed, then the blame would update.

Unfortunately, because git blame is now enabled for every file, and we had 257 files open, every time I changed the git index by committing, Zed would spawn 257 git processes simultaneously. Oops! (now fixed…).

That leaves only the mystery of why the main thread wasn't being woken up after 5ms. I think part of the problem here is contention – our process tree is using as much CPU as it can spawning git processes - but that's not the whole story.

In order to wake up the main thread after a few milliseconds, we spawn a background task that sleeps for the required time, and then signals (via the condition variable) the main thread to resume. Unfortunately GPUI timers on macOS were being run with a lower priority than background tasks. A classic priority inversion: The main thread was waiting on the lowest-priority task, which was in turn blocked by all our git processes. Oops again! (also now fixed…).

We take great pride in making Zed fast, and it's a little embarrassing when we accidentally make it slow. That said, it's always fun to dig in and figure out what's going on.

These fixes shipped in v0.133.7, along with fixes for a few other hangs that we discovered using our new monitoring tool (blog post to follow!).

If your Zed beachballs, please file an issue, we'd love to dig in with you and figure it out.