Zeta2.1: 3x Fewer Tokens, 50ms Faster

We launched Zeta2, Zed's edit prediction model, in March, and promised more improvements were on the way. Here they are.

Zeta2.1 emits 3x fewer output tokens than Zeta2, bringing predictions up to 50ms faster and requiring 30% fewer servers to serve the same traffic:

Metric	Zeta2	Zeta2.1
Output tokens (avg)	~270	~90 (−67%)
Response Time (p50)	189ms	136ms (−28%)
Response Time (p90)	401ms	350ms (−13%)
Acceptance rate	Baseline	+0.51%
Explicit rejection rate	Baseline	−4.10%

These efficiency gains came from a new prompt format we've dubbed "Multi-Region". While Zeta2 output a large region around your cursor with its edits applied, with the new Multi-Region format Zeta2.1 only outputs the region around the code it wants to change. This took several iterations to get right, but the result is even faster predictions on every keystroke.

Zeta2.1 is open-weight, just like Zeta1 and Zeta2. You can see examples of the new prompt format, and download the model on Hugging Face.

As with Zeta2, Zeta2.1 was trained entirely on opt-in data in open-source repositories. If you'd like to help contribute to future improvements, you can opt in by toggling the data collection setting.

Try It

Zeta2.1 is even better for running locally, and works out of the box. Additionally with this release we've begun to publish bindings for the Rust code we use in production to format prompts to PyPI, making it even easier to self host.

Zeta2.1 is the default edit prediction model in Zed today. You can try it out for free, or check out Zed Pro or Zed Business for unlimited edit predictions.

Check out similar blogs from the Zed team.