Codex CLI from zero: installing, authenticating, and the first useful loop

Published 2026-05-11 by Owner

Most AI coding agents run in an IDE. Codex CLI runs in your terminal, operates directly on the files in your working directory, and — by default — does it inside a sandboxed shell that limits what the model can actually execute. That last detail changes the risk calculus in ways that matter when you’re first handing a task to an agent.

The common comparison is Claude Code, which also runs in the terminal. The two tools share the same general model: you describe a task, the agent reads files, proposes changes, and you review them. The key difference is the default trust boundary. Claude Code gives the model broader shell access out of the box. Codex CLI starts restricted and makes you opt into escalation. Neither is categorically better — it depends on how much trust you’ve built with the tool and the task at hand.

This guide gets you from zero to a working first loop: installed, authenticated, first prompt answered, and one real task completed.

Install paths

Two supported routes.

npm (works everywhere Node is installed):

npm install -g @openai/codex

After that, codex --version should print a version string. If it doesn’t, confirm that your npm global bin directory is on your PATH — npm bin -g shows where the binary landed. On some systems the global bin is ~/.npm-global/bin or a path managed by a version manager like nvm; check nvm use is set if the binary shows up as “not found” after install.

Homebrew (macOS, no Node prerequisite):

brew install codex

The Homebrew formula pulls a prebuilt binary. Slightly less up to date than the npm version on release days, but a cleaner install if you’re not already a Node user. Homebrew handles upgrades cleanly too: brew upgrade codex when a new version lands.

Either way, the binary is codex. Run codex --help to confirm it’s reachable and to see the current set of flags — the CLI is actively developed and flags shift between minor versions.

Two auth modes

Codex CLI supports two ways to authenticate. They differ in what they cost and how you configure them.

If you have an active ChatGPT Plus or Pro subscription, Codex CLI can authenticate through your OpenAI account session. Run codex auth login and follow the browser prompt. After completing the OAuth flow, the CLI stores a session credential locally — subsequent runs pick it up without prompting.

Usage under this mode draws against your subscription quota rather than generating separate API charges. For Plus subscribers, the daily usage cap means very long agent sessions will eventually stall out and require you to wait for the cap to reset. For Pro subscribers, the cap is substantially higher and rarely the bottleneck in normal use.

The convenience factor is real: if you already pay for ChatGPT and the cap fits your usage pattern, you’re up and running in 90 seconds. The tradeoff is coarser visibility — you can’t see how many tokens a specific session consumed, and you’re sharing quota with your ChatGPT web usage during the same billing window.

API key auth

export OPENAI_API_KEY=sk-...

Set the env var before running codex, or add it to your shell profile. The CLI picks it up automatically — no codex auth step needed when the key is present. If you’re using a tool like direnv, you can scope the key to a project directory so it’s only active when you’re working in that repo.

Usage is metered per token. You can set hard spending limits in the OpenAI dashboard, create separate keys for different projects, and inspect per-request token usage after the fact. For teams or anyone who wants predictable cost visibility, this is the right path.

One decision point: if you have both a Plus subscription and an API key, use the subscription for exploration and the API key for automation. Scripted or CI-adjacent use should be on metered billing where each run’s cost is attributable.

The practical summary: Plus/Pro login is faster to start with and economical if you’re already a subscriber. API key auth costs more per token but gives you full cost visibility and no shared-quota surprises.

The first command

With auth sorted, navigate into a project directory and run:

codex

This opens an interactive REPL. The model loads, and you get a prompt. Type a plain-English request:

Add a docstring to every function in utils.py that doesn't have one yet

Codex will read the file, generate the additions, and propose a diff. You’ll see output that looks like a patch — file path, line ranges, and the proposed change. Before writing anything, it asks for confirmation.

That confirmation step is not optional ceremony. It’s the actual gate. The model proposes; you approve. Get comfortable with reading the diffs before you hit y. Skipping that habit early is how people end up with a codebase where they’re not sure what an agent changed three sessions ago.

One useful pattern: if a proposed diff looks bigger than expected, ask the model to explain what it changed and why before approving. The explanation is usually short and clarifies whether the scope is right.

Alternatively, pass the task directly on the command line for non-interactive use:

codex "Add a docstring to every function in utils.py that doesn't have one yet"

This enters the same loop but exits after completing the task. Useful for scripted workflows, makefiles, or when you want a clean terminal log of what ran without staying in the REPL.

The sandboxed shell

Here is where Codex CLI differs most from Claude Code.

Claude Code, by default, runs commands in your real shell with your real permissions. It has read/write access to your filesystem and can run arbitrary shell commands if you allow them. That’s powerful and also a reason to pay attention to what it’s about to do.

Codex CLI defaults to a restricted shell. The model-generated commands run in a constrained environment that blocks writes to paths outside the working directory and prevents a class of destructive commands (rm -rf on arbitrary paths, for example). The model can read your project files, propose edits, and run scoped commands like test runners — but it can’t, by default, touch system files or run something that would affect state outside the working directory.

This is a meaningful safety property for the first few sessions while you’re building trust in the agent’s judgment.

When you need the agent to do something the sandbox blocks — installing a dependency, creating a directory outside the project tree, running a deploy script — you can escalate. The flag is --sandbox=off or the equivalent config setting:

codex --sandbox=off "Run npm install and then run the test suite"

Escalate deliberately and specifically. Don’t turn sandbox off globally as a default.

A 10-minute starter task

The best task to start with is one you already understand and could do yourself in 10 minutes, but would rather not. Test-writing is ideal: the output is mechanically verifiable, the scope is contained, and you get to see how the agent navigates the project structure before trusting it with something harder.

Pick a file with a few untested utility functions. Start an interactive session:

codex

Then:

Look at src/lib/format.ts. Write a Vitest test file for every exported function.
Use the existing test conventions from any *.test.ts file you find in the project.

Watch what happens in the first few turns:

The model reads format.ts to understand the exports
It finds an existing test file to anchor on the convention
It proposes a new test file at the expected path

At each step you see the proposed action before it executes. If the model reaches for a wrong file or proposes writing to an unexpected path, that’s the point to correct it — not after the files exist.

The sandbox means that even if you approve something that turns out wrong, the blast radius is limited to the working directory. Revert with git checkout . and try again with a more precise prompt.

After the test file lands, run the tests:

bun run test  # or npm test, depending on your setup

If they pass, you’ve just had Codex write tests you trust — because you read both the proposed diff and the final output. If some fail, inspect the failures. Often the model has the right structure but a wrong expectation — fix the test prompt to be more specific about edge cases.

What to try next

The interactive REPL and the sandboxed shell together give you a reliable base for incremental trust-building. Small refactors, test additions, docstring passes, and file reorganizations are all tasks where Codex’s judgment can be verified quickly.

The more interesting use is chaining tasks: plan a refactor in one session, execute each part incrementally, verify after each step. That’s where the combination of model quality and sandbox safety starts to compound. The GPT-5 and o-series reasoning models that back Codex handle multi-step task decomposition well; the sandbox means you can let them work without watching every keystroke.

The CLI is also usable in quiet mode for non-interactive pipelines — generating changelogs, annotating code for docs builds, or adding test coverage as a pre-commit step. That’s further down the road. The right first milestone is a session where you reviewed every diff, the tests passed afterward, and you understood what changed. Once that loop is reliable, the rest builds naturally.

Start small, read every diff, and increase scope once you know how the agent behaves in your specific codebase.