OpenAI's Codex CLI enters public beta with terminal-first agent workflow

OpenAI shipped Codex CLI to public beta this week. The product is a terminal-based agent, similar in shape to Anthropic’s Claude Code. Run a command, describe a task, the agent does the work — reading files, running commands, editing code.

What it does

Codex CLI is a shell-launched agent. Quick example:

$ codex "add error handling to the auth middleware"

Reading auth/middleware.ts...
Reading auth/types.ts...
Found auth middleware. Current implementation throws on missing tokens.
Plan: add structured error handling that returns appropriate HTTP status codes.

Apply this plan? (y/n) y

Writing auth/middleware.ts...
Writing tests/auth/middleware.test.ts...
Running tests...
3 tests pass.

Done. Modified 2 files. Created 0 files.

The interaction is conversational but in the terminal. You can chain commands, redirect output, integrate with shell scripts. Codex works in any directory; it’s not tied to an editor.

Pricing

Codex CLI bills as part of OpenAI’s API — pay for the tokens used. The pricing is the same as direct API usage; no extra subscription. For ChatGPT Plus subscribers, there’s a separate quota that includes some Codex usage.

For comparison: Claude Code is included in Claude Pro/Max subscriptions and also available via API. The pricing parity is consistent across the major labs.

Where it fits

The CLI agent category is now competitive:

Claude Code (Anthropic): mature, deep tool integration, MCP support
Codex CLI (OpenAI): newer, similar shape, integrated with OpenAI’s ecosystem
Aider (open source): the original CLI agent, BYOK model
Continue.dev CLI (open source): smaller user base, more customizable

For users picking among these:

Claude Code wins on tool integrations and the maturity of agentic workflows
Codex wins if your stack is OpenAI-aligned (using GPT-4o, ChatGPT, or other OpenAI products)
Aider wins on cost flexibility (any model via BYOK)
Continue wins on customization (open source, modifiable)

What’s interesting about the launch

A few aspects worth noting:

OpenAI committing to the CLI form factor. This is OpenAI’s first serious terminal-based product. It signals they see CLI as a real category, not a niche. The investment to build and maintain a CLI tool is non-trivial; OpenAI building one is a vote that the form factor matters.

Tool calling parity. Codex CLI uses GPT-4o or o1 with tool use. The tool surface is comparable to Claude Code’s: file operations, shell execution, search. Specific differences exist but the rough capabilities are similar.

No editor integration. Unlike Cursor or VS Code’s Copilot, Codex CLI doesn’t run inside an editor. You use it from a terminal. This is the same choice Anthropic made with Claude Code — the CLI is a different product than the editor extension.

Strong defaults vs configurability. Codex CLI ships with strong default behavior; configurability is less than Aider’s. For users who want deep customization, Codex is more constrained. For users who want it to “just work,” it’s more polished.

Quick test of capabilities

I gave Codex a few representative tasks to compare with Claude Code:

Task 1: Refactor a Python class to use composition over inheritance. Both tools handled it. Claude Code’s plan was slightly more nuanced (anticipating the test impact); Codex’s implementation was slightly more idiomatic Python. Tie.

Task 2: Debug a flaky test. Both tools eventually solved it. Claude Code asked clarifying questions earlier. Codex jumped to a hypothesis faster, which was wrong on first attempt. Slight edge to Claude Code on the workflow.

Task 3: Implement a new endpoint following an existing pattern. Both tools matched the pattern correctly. Codex was faster (perhaps because the underlying model is faster on routine work). Slight edge to Codex on speed.

Task 4: Multi-file refactor. Both tools handled it. Claude Code’s plan was more complete (caught a related file that needed updating). Codex’s first plan missed the file; the second plan caught it after I asked. Edge to Claude Code on planning depth.

After a couple of hours, the qualitative impression: Claude Code is a bit more deliberate; Codex is a bit faster. Neither is dramatically better. For my workflows, I’d pick Claude Code because of MCP support and the maturity of the agent loop.

Implications

A few things this launch suggests:

The CLI agent form factor is real. Two of the three top labs have CLI products. The third (Google) doesn’t yet but will likely follow. The category is established.

Editor integration isn’t the only path. Cursor, Cline, Copilot all live inside an editor. CLI agents don’t. The CLI surface is competitive for many workflows, especially backend and devops work where you’re already in a terminal.

The competitive frontier is in tool ecosystems. With agent loops becoming similar across tools, the differentiator is what tools the agent can use. MCP servers, custom integrations, internal tooling. Claude Code’s MCP support is a moat; Codex doesn’t have direct MCP support yet.

API pricing reflects the agent cost structure. Token billing for agent workflows is genuinely expensive (tools call lots of tokens). Both Anthropic and OpenAI now have subscription tiers that absorb this cost. Pure API users pay more directly. The market is segmenting into “subscription convenience” and “BYOK control.”

Who should try Codex CLI

If you:

Already use OpenAI’s models heavily
Work in a terminal-centric environment
Want to add agent capabilities to scripts or automation
Are curious about how OpenAI’s take differs from Anthropic’s

It’s worth installing and testing for a few hours. The pricing aligns with API usage, so there’s no friction to start.

If you’re already happy with Claude Code or another CLI agent, the switching cost is low (it’s a tool, not a platform), so you don’t need to switch unless something specific draws you.

What I’ll be watching

The interesting questions for the next 6 months:

Whether Codex picks up MCP support or builds an alternative ecosystem
Whether OpenAI’s quality improvements on the agent loop outpace Anthropic’s
How the CLI agent category compares to the editor agent category in adoption
Whether teams standardize on one CLI tool or use multiple

The CLI agent space is now competitive enough that the choice matters. Beyond just installing whichever was first, evaluating fit for your work is a real exercise.