Use a Local Model

Use local models when you run the model on your machine or on infrastructure you control.

Local path	Zed AI features	External Agents	Terminal Threads	Notes
llama.cpp	Yes	Separate config	Separate config	Configure a llama.cpp server for Zed AI features
LM Studio	Yes	Separate config	Separate config	Configure LM Studio for Zed AI features
Ollama	Yes	Separate config	Separate config	Configure Ollama for Zed AI features
Local OpenAI-compatible server	Yes	Separate config	Separate config	Configure base URL, model, and key if needed
Local/self-hosted edit prediction	Edit Prediction only	No	No	Uses Edit Prediction setup

llama.cpp

Use llama.cpp and its built-in server for local models with Zed Agent, Inline Assistant, and similar model-backed Zed AI features.

Install llama.cpp from llama.app.
Start the server in router mode:
```
llama serve
```
It loads models from the llama.cpp cache on demand. To download and run a specific model in one step, pass -hf:
```
llama serve -hf unsloth/gemma-4-26B-A4B-it-GGUF:BF16
```
In Zed, select a llama.cpp model from the model dropdown.

Zed automatically discovers the served models with their context length and tool/vision capabilities. In router mode these are refined once a model loads, via the server's /models/sse stream (which requires a recent llama.cpp build). To list models yourself instead, set auto_discover to false:

{
  "language_models": {
    "llama.cpp": {
      "api_url": "http://localhost:8080",
      "auto_discover": false,
      "available_models": [
        {
          "name": "gemma-4-12b-it-GGUF:BF16",
          "display_name": "gemma-4-12b-it-GGUF:BF16",
          "max_tokens": 32768,
          "supports_tools": true,
          "supports_images": false
        }
      ]
    }
  }
}

llama.cpp Context Length

Zed uses the context length the server reports (/props). Override it for all models with context_window, or per model with max_tokens in available_models:

{
  "language_models": {
    "llama.cpp": {
      "context_window": 8192
    }
  }
}

If your llama.cpp server requires a key, enter it in the provider UI or set LLAMACPP_API_KEY. For a remote server, set the API URL to its endpoint and provide the key (set on the server with --api-key).

Ollama

Use Ollama for local models with Zed Agent, Inline Assistant, and similar model-backed Zed AI features.

Download and install Ollama from ollama.com/download.
Pull a model:
```
ollama pull mistral
```
Make sure the Ollama server is running. On macOS, open Ollama.app. On Linux or from a shell, run:
```
ollama serve
```
In Zed, select an Ollama model from the model dropdown.

Zed automatically discovers models that Ollama has pulled. To disable autodiscovery and list models yourself, configure auto_discover:

{
  "language_models": {
    "ollama": {
      "api_url": "http://localhost:11434",
      "auto_discover": false,
      "available_models": [
        {
          "name": "qwen2.5-coder",
          "display_name": "qwen 2.5 coder",
          "max_tokens": 32768,
          "supports_tools": true,
          "supports_thinking": true,
          "supports_images": true
        }
      ]
    }
  }
}

Ollama Context Length

Zed requests to Ollama include context length as the num_ctx parameter. By default, Zed uses 4096 tokens.

Set a context length for all Ollama models:

{
  "language_models": {
    "ollama": {
      "context_window": 8192
    }
  }
}

You can also configure context length per model with max_tokens in available_models.

If your Ollama server requires a key, enter the key in the provider UI or set OLLAMA_API_KEY. For remote Ollama services such as Ollama Turbo, set the API URL to the remote endpoint and provide an API key.

LM Studio

Use LM Studio for local models with Zed Agent, Inline Assistant, and similar model-backed Zed AI features.

Download and install LM Studio.
Download at least one model in LM Studio, or use the LM Studio CLI:
```
lms get qwen2.5-coder-7b
```
Start the LM Studio API server:
```
lms server start
```
In Zed, select an LM Studio model from the model dropdown.

If your LM Studio server requires a key, enter the key in the provider UI or set LMSTUDIO_API_KEY.

Use a Local Model

llama.cpp

llama.cpp Context Length

Ollama

Ollama Context Length

LM Studio

Local OpenAI-Compatible Servers

Local Edit Prediction

Agent Path Boundaries