Jessie Frazelle, a veteran engineer with deep experience across infrastructure, runtime, and developer tooling, has tried nearly every async agent she could find. Not just in a demo environment but on real production code. Her goal was straightforward: find something that actually helps.
And she's here to share her evaluation of async AI agents.
Jessie shared what async agents get wrong, the subtle bugs they miss (or cause), and which agent finally earned a place in her team’s workflow. Her team’s tech stack spans Rust, C++, Python, and TypeScript—and they’ve seen firsthand which tools create confusion, and which ones quietly earn their place.
You can watch the session on YouTube or read below for some selected quotes.
Most Async Agents Just Add Noise
"If you’ve got nothing to say, don’t say it.” - Jessie Frazelle
Jessie started with a simple goal: catch bugs before they ship. But most async agents delivered little more than low-value comments and auto-summaries. Tools like Copilot and Sourcery would comment on every PR, whether or not they had anything useful to say.
Some went further:
"One edited my original comment. It's just so weird to me. It seems overly invasive to me.” Jessie said.
“Half the time they hallucinate summaries or miss the point entirely,” she said. “And then they clutter up reviews with comments like, ‘Everything looks good!’"
Even more frustrating were tools that tried to take over the entire code review: generating verbose analyses or flow diagrams that looked authoritative, but often lacked real insight. As Jessie put it: “If an AI’s giving me a diagram of some flow, can I trust it’s even right? Or that it hasn’t missed something critical?”
The One That Worked
The exception turned out to be Graphite Diamond.
Unlike its peers, Diamond doesn’t speak unless it has something worth saying. That alone made it stand out. Jessie described it as “more signal than noise", catching math bugs in CAD code, spotting subtle logic errors, and even flagging variable mismatches in a custom DSL.
“We didn’t even know it was on at first.” she said. “It doesn’t comment unless it’s got something real. But when it did, it was like-whoa, that’s a legit bug.”
Jessie and her team now rely on Diamond across multiple codebases, including their most high-stakes C++ and Rust repos. They’ve even experimented with customizing its system prompts, and they appreciated its thoughtful design, like surfacing product promotions in CI instead of interrupting PR threads.
Trust, but Verify
Jessie doesn’t adopt tools lightly. She’s quick to push back when an agent suggests something wrong, especially in unfamiliar languages like TypeScript. But she says trust has grown over time: “It’s like a second pair of eyes. I still read everything, but now I’ve got backup.”
That backup matters most in places where mistakes are costly. In code that runs geometry solvers or generates manufacturing instructions, a silent math bug could cost days to track down---or worse, slip into production.
“Sometimes it’ll say ‘this could cause misaligned holes,’” she laughed. “That’s not a bug you want to debug after the fact.”
The Right Role for AI
Jessie’s takeaway isn’t that async agents are a panacea. Most are still clumsy and overconfident. But used carefully, they can catch what humans miss, especially when tuned for restraint and built to quietly assist, not dominate.
“AI’s not replacing code review anytime soon,” she said. “But the right agent, used the right way? It makes your team better.”