Cline task history: replaying past tasks without doing them by hand
Published 2026-05-11 by Owner
Every task Cline completes is written to disk. The conversation thread, the tool calls, the file diffs, the model responses — all of it, persisted in the extension’s storage directory. Most people ignore this and treat each session as ephemeral. That is a mistake: the history is one of the more useful things Cline offers, and it requires zero extra configuration to use.
Where task history lives
Cline stores task history under VS Code’s global extension storage. The exact path varies by OS:
- macOS:
~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/tasks/ - Linux:
~/.config/Code/User/globalStorage/saoudrizwan.claude-dev/tasks/ - Windows:
%APPDATA%\Code\User\globalStorage\saoudrizwan.claude-dev\tasks\
Each task gets a UUID-named directory. Inside you will find api_conversation_history.json (the raw message array Cline sent to the model) and ui_messages.json (what was rendered in the Cline panel). The api_conversation_history.json file is the ground truth of what the model actually saw — system prompt, file contents that were read, tool results, and the full back-and-forth.
You can also reach history through the Cline UI itself. The clock icon in the Cline panel header opens the task list sorted by recency. Clicking any entry re-opens it in the panel exactly as it was when it finished. Re-opening is read-only archaeology; to replay the task against current code you need to start a new task, which is covered below.
The storage directory grows without bound. A moderately active week of Cline sessions produces around 20-40MB; a month of heavy use is easily 200-400MB. Nothing prunes it automatically, so the directory accumulates silently. This matters for the privacy section at the end.
A quick way to inspect what is in there without opening every file:
# list task directories sorted by modification date, most recent first
ls -lt ~/Library/Application\ Support/Code/User/globalStorage/saoudrizwan.claude-dev/tasks/ | head -20
The UUID folder names are not human-readable, but the modification timestamps are enough to orient to the right timeframe.
The replay flow
Re-opening a past task in the history panel is not replay — it is archaeology. The task is frozen. To replay, the workflow is:
- Open the task history panel and locate the task to replay.
- Read the original prompt and the plan or approach the model took.
- Check out a new branch or confirm the current state is the right starting point.
- Open a new Cline task and paste the original prompt, adjusted for current context.
The adjustment step is where the value is. A replay is not a mechanical copy-paste of the original prompt. It is a prompt that captures what the original task was trying to accomplish, stripped of any state that no longer applies. Example: if the original task was “add pagination to the user list using offset/limit parameters” and the codebase now uses cursor-based pagination throughout, the replay prompt is “add pagination to the user list using the cursor-based approach already established in the codebase.”
The history gives a starting point that would otherwise require reconstructing from git log messages and mental recall. That reconstruction is always worse than having the actual conversation. The original task captured the edge cases noticed mid-session, the decisions made when two approaches were viable, and the scope corrections applied when the first attempt went wide. None of that survives in a commit message.
For tasks that repeat across projects — setting up a logging pattern, adding a standard error boundary, scaffolding a new route type, wiring up a feature flag check — keeping a personal library of past task prompts is more useful than any snippet library. The prompts already encode the decisions, the edge cases, and the approach that worked. Snippets encode only the output.
One practical addition: after a replay task completes successfully, add a short comment to the prompt noting the project and date it was used. Storing these in a notes/cline-replays.md file means the library is searchable and not buried in the tasks directory UUID maze.
A useful prompt format for the library:
# [short description]
# Used: [project], [date]
# Model: [model that produced the reference output]
# Notes: [anything that needed adjustment from the original]
[prompt text]
The “Notes” field is the part people skip and regret. Three months later the reason for a particular phrasing choice is invisible without it.
Using task history as a model smoke test
This is the highest-leverage use of replay that most people miss.
When a new model version drops — a new Claude Sonnet, a new DeepSeek release, a bump to whatever Cline defaults to — the natural question is: is this better or worse for my actual work? Public benchmarks answer this for code synthesis in the abstract. Task history answers it for a specific codebase and specific tasks.
The process:
- Pick three to five past tasks that were representative: one small and self-contained, one medium-complexity with multi-file coordination, one that required the model to understand an established pattern in the codebase.
- Run each as a new task with the new model, using the same prompts.
- Compare outputs: did the model reach the same solution? Did it read fewer files unnecessarily? Did it make wrong assumptions the old model did not?
This is not a rigorous benchmark. It is a five-minute gut-check that catches regressions and surprises that no public benchmark would surface, because the public benchmarks do not know the codebase.
A concrete example: running the same “extract this service layer from the controller” task on two models. Model A produces a clean extraction with proper dependency injection and matching test structure. Model B produces the extraction but inlines constants that should have come from the config module — a pattern that the project established months earlier and that every other file follows. Model B is wrong for this project specifically. Not because it is a worse model in the abstract, but because it does not pick up on the config convention. The only way to know this is to run the task.
For teams, coordinating smoke tests before switching a shared Cline configuration to a new default model prevents the scenario where half the team upgrades and half does not, and output quality diverges without anyone being sure why.
The smoke test is also useful when Cline itself ships a new version that changes default behavior — context window handling, tool call formatting, how it decides to read files before acting. A version bump to the extension can produce different outputs on the same model and same prompt. Running the smoke tests after an extension update catches these changes before they affect real work.
Keep the smoke-test prompts accessible — a notes/cline-smoke-tests.md alongside the replays file. The raw history directory is not organized for browsing and the UUID folders do not tell you which prompt is inside.
The diff-against-last-run pattern
When iterating on a prompt — adjusting scope, clarifying a constraint, changing the output format — the question is always “what actually changed in the output?” Diffing two runs surfaces this without reading both outputs in full.
After run one, copy the final assistant message (the summary Cline writes at task completion) or the list of files modified into a scratch file. After run two, do the same. Then diff the two scratch files.
# capture files Cline touched in the first run
git diff --name-only HEAD~1..HEAD > /tmp/run1-files.txt
# revert, adjust the prompt, re-run
git diff --name-only HEAD~1..HEAD > /tmp/run2-files.txt
diff /tmp/run1-files.txt /tmp/run2-files.txt
This pattern is most useful when tuning Plan mode prompts. The plan output is plain text; two plan outputs are trivially diffable. Run the plan with the original prompt, copy it out, adjust the prompt, run again, diff. The diff shows exactly what the model picked up on — or ignored — from the change. A prompt adjustment that produces zero plan diff is a signal that the added instruction is not being weighted.
For Act mode, the diff is the git diff itself. Two runs on the same starting branch (after reverting between them) produce two sets of file changes. The diff between those two diffs — a diff of diffs — shows how the prompt change affected execution. This is more work but catches cases where a prompt change fixes one thing and silently affects another.
The diff pattern also helps when a task produces a different result than expected from a previous run with the same prompt. This happens when the codebase changed between runs. Diffing the two api_conversation_history.json files from the two task directories shows exactly what context Cline read differently — which files changed, what the model saw in them, and where the two execution paths diverged. The history makes this diagnosable; without it, the divergence looks like random model behavior.
The privacy tradeoff
All of this is possible because Cline writes everything to disk. That is also the problem.
The api_conversation_history.json files contain prompts verbatim and all file contents that Cline read during the task. If a prompt included a database schema, an API key pasted in to test something quickly, internal architecture details, employee names, customer data visible in a fixture file, or a description of a system that is not supposed to be discussed externally — that content is sitting in plaintext in the tasks directory.
Cline itself does not exfiltrate this data. The history is local. But several things are true simultaneously:
Backups. The tasks directory is not excluded from Time Machine, iCloud Drive, Google Drive, Dropbox, or any other backup or sync service that covers the home directory. Full Cline history is in all of those backups, subject to their own security posture and retention policies. “It’s local” does not mean “it’s private” on a machine that backs up to the cloud.
VS Code sync. VS Code’s Settings Sync uses globalStorage directories selectively — task history is generally not synced, but the sync configuration can be customized. Verify the sync settings if this matters in the environment being used.
Shared machines. The tasks directory is readable by any process running as the same OS user. On a shared development machine or a managed workstation where IT can read the filesystem, Cline history is not meaningfully private.
Process on exit. When the VS Code process is killed or crashes, partially written task JSON may be in an incomplete state. This is mostly harmless but worth knowing if tasks are disappearing from the history unexpectedly.
What to purge:
- Any task where credentials were pasted in, even briefly, even to test.
- Any task involving infrastructure layout, internal system architecture, or security-sensitive configuration.
- Any task on a client project where confidentiality applies.
- Any task where file contents were read that contained PII — user records, employee data, test fixtures with real names or emails.
Purging is manual. Open the tasks directory, identify the UUID-named folders by the directory modification timestamp (that timestamp is when the task completed), and delete the relevant ones in full. There is no selective-field redaction — the whole task folder goes because the conversation and the file reads are interleaved through the JSON.
# open the tasks directory in Finder to review by modification date
open ~/Library/Application\ Support/Code/User/globalStorage/saoudrizwan.claude-dev/tasks/
A practical policy that holds up: anything that would be a problem if it appeared in a leaked file should not appear in a Cline task prompt, full stop. The history persistence makes this more consequential than it would be for a tool that discards sessions immediately. The tradeoff for accepting this constraint is access to all the replay and smoke-test capabilities described above. For most tasks on a non-shared machine with sensible backup hygiene, the tradeoff is worth it. For tasks involving sensitive data, the answer is to not put that data in the prompt in the first place, not to disable history after the fact.
The teams dimension: because history is per-machine and local, there is no shared history server to secure, which is mostly good. The downside is that task history cannot be shared across machines or team members, so the collaborative replay use case requires manually exporting and sharing prompt files. That friction is probably fine given the privacy implications of sharing full conversation histories across a team.