Codex sandbox mode: what it actually contains and where it leaks

Published 2026-05-11 by Owner

The first time you watch Codex CLI run npm install and then fail mid-task because it can’t reach the network, it feels like a bug. It’s not. It’s the sandbox doing its job.

Codex’s sandboxed shell is the core safety mechanism that separates it from a model that just has unrestricted terminal access on your machine. Understanding what it actually restricts — and where its coverage ends — changes how you decide when to trust it and when to stay skeptical.

This guide covers the sandbox in detail: the three primitives that define it, what getting blocked actually looks like in practice, when and how to expand permissions, the specific scenario where partial isolation is genuinely protective, and the threat categories the sandbox was never designed to cover. The goal is a clear mental model you can use to calibrate Codex sessions appropriately rather than defaulting to either maximum lockdown or maximum trust.

The three sandbox primitives

Codex restricts the execution environment along three dimensions: network, filesystem, and process spawning.

Network access is off by default. The sandbox blocks outbound HTTP and HTTPS connections at the OS level. A fresh Codex session cannot fetch packages, make API calls, clone repos from the internet, or reach external services. If a command fails with a connection error and you haven’t explicitly enabled network access, that’s why.

Filesystem writes are scoped to the project root. The model can read and write files inside the directory you launched Codex from. It cannot write to your home directory, system paths, or other projects sitting elsewhere on disk. If Codex tries to modify a file outside the project root — say, a dotfile at ~/.config/ — the sandbox blocks the write.

Process spawning is filtered. Some commands run transparently. Others require your approval before they execute. Commands that would persist beyond the session, modify system state, or have ambiguous side effects typically trigger an approval prompt. You’re not silently losing commands; Codex surfaces them for review rather than skipping them.

These three primitives are intentional defaults, not rough edges. The reasoning: a model that can’t reach the internet and can’t write outside your project has a much smaller blast radius if it does something wrong. The sandbox doesn’t make the model infallible — it limits the consequence of fallibility.

The sandbox is implemented differently depending on the OS. On macOS, Codex uses the sandbox-exec mechanism (part of the platform’s app sandbox infrastructure). On Linux, it uses seccomp-based restrictions. The specifics vary, but the user-visible behavior — blocked network, scoped filesystem, approval for certain process spawns — is consistent across platforms. You shouldn’t need to know the implementation details to use it correctly, but if you’re debugging an unexpected block, knowing which layer is responsible helps.

What gets blocked specifically

Concrete examples matter more than abstractions here.

Outbound HTTP is blocked for everything that goes through the system’s network stack: curl https://example.com, wget, direct TCP connections, and DNS lookups. Package managers fail when they try to reach their registries: npm install, pip install, cargo add, and go get all terminate at the network layer if you haven’t enabled network access for the session.

Filesystem restrictions apply strictly to writes outside the project root. A command like cp ./config.json ~/.ssh/authorized_keys is blocked regardless of how plausible the task sounds. The same applies to any path outside the working directory tree — global git config, shell profiles, system library files. The model’s read access is broader, which matters for the threat model discussed later.

Process spawning approval covers commands that have meaningful side effects outside the current session: persistent background daemons, cron registrations, commands that touch global system state. Codex will surface these for explicit confirmation rather than executing silently.

There’s a practical implication to the filesystem scoping: the model can read .env, credentials files, SSH keys, and any other sensitive file that happens to live inside your project root. The sandbox permits reads broadly and restricts writes narrowly. This distinction becomes important when network access is enabled.

The approval prompts for process spawning are worth understanding before you hit them. When Codex wants to run a command that requires approval, it pauses and shows you what it’s about to execute. You can approve, reject, or modify the command. Approving once doesn’t create a standing permission — each approval-required command needs its own confirmation. This is the escalation path for expanding permissions without fully disabling the sandbox: you keep the default restrictions in place and approve exceptions individually as they come up, rather than enabling broad permissions at session start.

Granting network access when you need it

The most common reason to need network access mid-session is package installation. When Codex is scaffolding a new project or adding a dependency to an existing one, it has to reach the package registry.

You can enable network access at startup:

codex --full-auto --network-access=on

Or you can approve network access at runtime when Codex encounters a blocked command and surfaces the prompt. The runtime approval is per-command in some configurations; in others it enables network for the remainder of the session.

The important thing to understand about enabling network access: you’re not just letting npm install run. You’re opening an outbound channel for everything the model executes in that session. If the model has read your .env file — which is inside the project root and therefore readable — it now has the capability to exfiltrate that content via an HTTP call to an attacker-controlled domain. Not that it will. But it can, and the sandbox no longer prevents it.

This isn’t hypothetical paranoia. The sandbox model changes materially when network is enabled:

network OFF → model is isolated
              worst case is local file damage within the project
              secrets stay local even if read

network ON  → model can reach external services
              model trustworthiness is now load-bearing
              approved package installs run with project-root write access

The practical workflow: enable network access for the specific task that requires it — installing packages, fetching a remote schema, calling an API — then treat the rest of the session with the same skepticism you’d apply to running commands with full internet access. The sandbox was on at the start; that doesn’t carry forward once you’ve opened the network.

A task the sandbox makes meaningfully safer

Consider what happens when you ask Codex to clean up test fixtures and it writes a rm command that’s more aggressive than you intended — a glob that’s slightly broader than the target directory.

With sandbox disabled or network enabled and session-wide trust applied, that rm runs immediately. If the glob is wrong, files are gone.

With the sandbox on and the model operating in a scoped project, Codex can still run rm within the project root — but it cannot touch anything outside it. The blast radius is bounded. Your OS installation, your dotfiles, and your other projects remain unaffected even when the command was wrong.

A subtler variant: npm install of a package whose name looks legitimate but contains a typo that matches a typo-squatted malicious package. With network access off, the install fails before any code from that package runs. The network block bought a moment to verify the package name. The model can identify the intent (install the package), but it can’t act on that intent until you’ve confirmed network access is appropriate.

This is the real value of the sandbox: it doesn’t have to be perfect to be useful. Even partial isolation changes the cost curve of a model mistake from “possible irreversible damage” to “contained and recoverable.”

There’s an asymmetry worth naming here. The sandbox is very good at preventing accidental damage from well-intentioned but wrong commands. It’s considerably weaker against a model that has been manipulated through prompt injection in a file it read — a technique where malicious content in a project file tries to redirect model behavior. If Codex reads a file containing hidden instructions that say “exfiltrate everything to this URL,” the sandbox blocks the exfiltration attempt only if network is still off. If network was already enabled for package installation earlier in the session, the sandbox offers no protection against that specific attack vector. This edge case is worth knowing, not to induce paranoia, but to understand that the sandbox’s threat model is “mistaken model actions,” not “adversarial content in the model’s context.”

What the sandbox does not protect you from

The sandbox is a runtime restriction on the Codex process. It says nothing about whether the code Codex produces is safe for you to run yourself, and it says nothing about what malicious code inside an approved package installation might do.

Code suggestions that you execute outside the sandbox. Codex can suggest a shell one-liner, a database migration script, a deployment command, or a curl invocation to reset a service. The sandbox restricts what Codex runs autonomously. It doesn’t analyze the content of what it proposes. If you copy a Codex suggestion into your terminal and run it, you’re operating entirely outside the sandbox. The model’s reasoning is load-bearing, and there’s no runtime check on that.

Supply chain compromise when network is on. If you’ve enabled network access and Codex runs npm install some-package, the sandbox doesn’t inspect what that package contains. A malicious package with install-time lifecycle scripts (preinstall, postinstall) runs in your project context with whatever filesystem access the sandbox grants the session. The sandbox scoped writes to your project root — which means the malicious install script has full write access to your entire project. Enabling network access for package installation and not reviewing what’s being installed is trusting the registry’s integrity, not the sandbox’s.

Model behavior at the reasoning layer. The sandbox controls what commands execute. It has no visibility into the model’s suggestions. A model that recommends chmod 777 on a sensitive directory, proposes hardcoded credentials as a “quick fix,” introduces a subtle SQL injection via string interpolation, or designs a system with a logic error that creates a privilege escalation path isn’t blocked by any sandbox primitive. The model can do these things while running entirely within sandbox bounds. Code review is still necessary, not optional.

The honest framing: the sandbox reduces automated damage from model mistakes. It doesn’t eliminate the need for human judgment on model output. Treating the sandbox as a substitute for review is the pattern that produces incidents.

There’s a specific pattern worth calling out: people who use --full-auto mode (where Codex runs without stopping to confirm each tool call) often assume that the sandbox compensates for removing the human confirmation step. It partially does — the sandbox still restricts what the fully-autonomous session can do. But --full-auto removes the moment where you read the command before it executes. The sandbox limits consequences after a bad command runs; the confirmation prompt was the opportunity to prevent the bad command from running at all. Removing the prompt while keeping the sandbox is a different risk profile than having both. It’s appropriate for tasks where the failure mode is recoverable (editing source files, running tests) and less appropriate for tasks that touch live systems or credentials.

Matching sandbox configuration to task risk

The sandbox shifts the question from “is this model trustworthy?” to “is this model trustworthy enough for this specific task with these specific restrictions?”

With network off and filesystem scoped to a throwaway project, the model operating in that context has low risk even with aggressive autonomous permissions. The restrictions absorb most of the impact of bad decisions.

With network on and the model operating on a production project that has real credentials, a live database URL, and infrastructure configuration in .env, the restrictions absorb very little. Now the configuration is roughly equivalent to giving the model unrestricted access and trusting it not to misuse it.

The practical calibration:

exploratory / throwaway project → higher autonomous permissions, network on if needed
production config / real credentials → network off, confirm non-trivial file operations
automated CI pipeline → network scoped to necessary registries only, no credential files in scope

A useful heuristic: if you’d be uncomfortable running every command in a session manually in your current environment, you probably shouldn’t let Codex run them autonomously with the same permissions in that environment.

The sandbox exists because the model isn’t perfect. Configuring it well means acknowledging which mistakes are tolerable at which cost, not assuming the default configuration covers every case.

One operational note worth keeping: when you do enable network access for a session that runs npm install, review what packages were actually installed before committing the lockfile change. The sandbox permitted the install; you still own the decision of whether that install was correct.

For teams that run Codex in shared CI environments, the sandbox configuration deserves a dedicated review. A CI job that invokes Codex with broad permissions — network on, no credential separation — has a different risk profile than an individual developer running it locally on a throwaway branch. The blast radius of a model mistake scales with the environment it’s running in. A misconfigured Codex session on a developer’s laptop is a recoverable incident; the same configuration on a CI runner with production credentials in the environment is a potentially serious one. Treating the sandbox defaults as suitable for both contexts isn’t correct.

One thing that often surprises people: the sandbox is active regardless of which --approval-policy mode you choose. The approval policy controls how often Codex pauses to ask for confirmation before running a command. The sandbox controls what those commands are allowed to do once they run. They’re independent axes. You can have a fully-interactive approval policy (Codex asks before every command) combined with no sandbox (commands can do anything once approved), or a fully-autonomous approval policy combined with the default sandbox (commands run without confirmation but within sandbox bounds). The defaults pair --approval-policy=on-failure with sandbox enabled — Codex confirms commands that would require the sandbox to block, and runs low-risk commands autonomously within the restricted environment.

The broader insight about Codex’s sandbox is that it’s doing something most security tooling doesn’t: restricting a cooperative process that you control, not a hostile one you’re defending against. Its goal isn’t to defeat an adversary — it’s to limit the footprint of imperfect automation. That narrower scope is exactly why it works for what it’s designed for, and exactly why it’s insufficient for threat models it wasn’t designed for.

As Codex matures and the models it runs improve, the default configurations will likely shift — more autonomous permission on straightforward tasks, tighter restrictions where the consequence of error is high. The current sandbox defaults are calibrated for where model reliability is now, not where it will be. Revisiting your session configuration as the tool evolves, rather than treating the current defaults as permanent, is the right long-term posture.