Zed's inline assistant: the keystroke-driven AI flow

Published 2026-05-11 by Owner

Most AI-assisted editors send you to a chat panel. Open the panel, describe the problem, paste the relevant snippet, read the suggestion, copy it back, paste it in. That round-trip takes between 30 and 90 seconds for something that should take five.

Zed’s inline assistant skips all of that. Select code, hit Ctrl-Enter, type a prompt, get the edit applied right where the cursor is. The selected code is already the context — nothing to paste, nothing to copy back.

How to invoke it

The default chord is Ctrl-Enter on Linux/Windows and Ctrl-Enter on macOS as well (Zed uses ctrl rather than cmd for editor-mode shortcuts). In practice:

Select the code to change — one line, one function, one block
Press Ctrl-Enter
A small input bar appears inline, overlaid on the selection
Type the prompt and press Enter again
The suggested edit appears as a diff, highlighted inside the file
Press Tab to accept or Escape to dismiss

No mouse. No panel focus. The cursor never leaves the file.

If Ctrl-Enter conflicts with a keybinding from your keymap, rebind it in ~/.config/zed/keymap.json:

[
  {
    "context": "Editor",
    "bindings": {
      "ctrl-i": "editor::ToggleInlineAssist"
    }
  }
]

editor::ToggleInlineAssist is the action name. Use whatever chord fits your muscle memory.

The selection-as-context model

This is the key mental shift. In a chat-based flow, context is something you curate: copy the relevant code, paste it into the chat, explain what it is. The model sees a decontextualized snippet.

Zed’s inline assistant inverts this. The selection IS the context. Zed passes the selected lines directly to the model, framed by the surrounding file. The model knows where the code lives, what imports are visible, what function it’s inside. There is no “paste the code” step because the code is already there.

The practical effect: shorter prompts work. Instead of typing “Here’s a function that sorts users by join date — can you rename the parameter u to user throughout?”, the prompt is just “rename u to user”. The model infers the rest from the selection.

A real example. Select this:

function fmt(u: User, short: boolean): string {
  return short ? u.n : `${u.n} (${u.e})`;
}

Prompt: expand abbreviated parameter and property names for readability

Output:

function formatUser(user: User, abbreviated: boolean): string {
  return abbreviated ? user.name : `${user.name} (${user.email})`;
}

The model got u.n, u.e, and the function name from the selection. The prompt didn’t mention any of them.

When inline beats the agent thread

Zed also has a full agent thread (the assistant panel, accessible via Cmd-Shift-A). It’s the right tool for multi-file tasks, code generation from scratch, or anything that requires the model to browse the codebase. The inline assistant is not a replacement for that.

The inline assistant wins for:

Small, localized edits. Renaming something, extracting a variable, changing a type annotation, adding a null check. One selection, one prompt, one diff.

Rewrites with preserved shape. “Make this async.” “Add error handling.” “Convert to use switch.” The structure stays; specific parts change. The model has no reason to drift because it can see what to keep.

Typos and style fixes. Select a comment block, prompt “fix grammar”, done. This sounds trivial but it’s genuinely faster than doing it by hand.

Inline documentation. Select a function, prompt “add JSDoc”. The agent thread would do this too, but with more ceremony.

Use the agent thread for: adding a feature that spans multiple files, understanding how something works, generating tests for a module, or anything where the relevant context is not the current selection.

The heuristic that holds: if the fix is visible inside the selection, use inline. If the fix requires reading code that isn’t in the selection, use the agent thread.

Configuring the model

Zed supports BYOK (Bring Your Own Key) for inline AI features. Settings live in ~/.config/zed/settings.json under the assistant key:

{
  "assistant": {
    "version": "2",
    "default_model": {
      "provider": "anthropic",
      "model": "claude-3-5-haiku-20241022"
    },
    "inline_alternatives": [
      {
        "provider": "openai",
        "model": "gpt-4o-mini"
      }
    ]
  }
}

Two choices worth making deliberately:

Model for inline vs. model for the agent thread. Zed 0.162+ lets you set a different model for inline assists. For inline, latency matters more than raw capability. A fast model that responds in 1.5 seconds keeps the keystroke flow intact. A slow model that responds in 8 seconds breaks it — the pause is long enough to mentally context-switch. Use a faster tier (Haiku, GPT-4o mini, Gemini Flash) for inline and a stronger tier for the agent thread.

Provider and API key. Supported providers include Anthropic, OpenAI, GitHub Copilot, Ollama (local), and several others. Add the API key under "api_key" in the provider block, or set it as an environment variable that Zed can read at startup. For Ollama, set "api_url": "http://localhost:11434" — no key required, the model runs locally.

The inline assistant’s utility is highest when it’s fast. If the selected model’s response time is over four seconds, consider switching to a faster alternative even if it’s less capable.

The keystroke economy

The case for inline over a chat workflow comes down to interruptions. Every time the hands leave the keyboard to switch panels, the mental context of what was being edited takes a small hit. For 10 short fixes in an hour, that overhead adds up.

A rough comparison for a small refactor — rename a parameter, extract a helper, fix a type:

Flow	Steps	Rough time
Chat panel	Focus panel, describe + paste, read response, copy, paste, focus editor	~60 seconds
Inline assistant	Select, `Ctrl-Enter`, type prompt, `Tab` to accept	~10 seconds

That 50-second gap may not sound significant. But the faster version stays inside the editor’s keymap. No mouse, no panel toggle, no mode switch. At 10 prompts a session that’s over eight minutes of recovered focus, and more importantly, the flow of reading and editing code is not broken.

The comparison holds only for tasks the inline assistant is suited for. The chat panel is faster for tasks that require a conversation — multiple rounds of clarification, asking the model to explain a decision, iterating on an approach before committing. Inline is for tasks where the prompt is already clear before pressing the chord.

One last pattern worth naming: inline assists can be chained. Select a function, rename the parameters. Then select the same function, add a return-type annotation. Then select the call site, update the argument names. Three prompts, three Tab accepts, no panel ever opened. This chained pattern is where the keystroke economy really pays out — each fix is so fast that even small improvements become worth making.

What it doesn’t do

A few things worth being clear about before leaning on this heavily:

No multi-file awareness by default. The inline assistant operates on the selection and the open file. It does not read across the codebase. If a rename needs to propagate to five files, use the agent thread or a project-wide find-and-replace.

No memory between prompts. Each inline prompt is independent. The model has no recollection of the previous inline change made two minutes ago. If related edits need to be consistent with each other, either do them in one selection or use the agent thread where the conversation history is maintained.

Suggestions are not always smaller than the selection. Ask to “add logging” inside a five-line function and the diff might add eight lines. The model fills in what it thinks is needed. Review the diff before accepting, especially for prompts that imply addition rather than transformation.

These are narrow constraints. Within them, the inline assistant is the fastest path between “I see something that should change” and “it is changed.”