Using Cline + MCP to debug a data pipeline: connecting the agent to real systems

A few weeks ago I wired Cline to our staging Postgres and Snowflake databases via MCP servers, then used the setup to debug a real data quality issue. The investigation took 90 minutes; my honest estimate without the setup was 4-6 hours. The pattern has since become my default for similar investigations.

This is the case study — what I configured, how the investigation actually went, and where MCP helped vs. where I still had to do work the AI couldn’t.

The setup

Cline 3.6 with Claude 3.5 Sonnet, plus two MCP servers configured:

{
  "mcpServers": {
    "staging-postgres": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-postgres",
        "postgresql://reader@staging-db.internal/analytics?sslmode=require"
      ]
    },
    "snowflake": {
      "command": "python",
      "args": [
        "-m",
        "mcp_server_snowflake"
      ],
      "env": {
        "SNOWFLAKE_ACCOUNT": "...",
        "SNOWFLAKE_USER": "claude_reader",
        "SNOWFLAKE_PRIVATE_KEY_FILE": "/path/to/key.p8",
        "SNOWFLAKE_WAREHOUSE": "ANALYTICS_DEV",
        "SNOWFLAKE_DATABASE": "PROD_REPLICA",
        "SNOWFLAKE_SCHEMA": "ANALYTICS"
      }
    }
  }
}

Both connections are read-only. Postgres uses a reader role with SELECT only. Snowflake uses a service account scoped to the analytics schema with SELECT on the relevant tables.

This is a deliberate design: Cline can query, Cline cannot mutate. The “agent reads systems, human writes systems” boundary is the safety belt.

The bug

The bug report from the data team:

Daily revenue numbers in the analytics dashboard are showing about 3% lower than the equivalent numbers in the source-of-truth Stripe data. The discrepancy started about 4 weeks ago. We can’t find any obvious change in our pipeline that would explain it.

The pipeline:

Stripe webhooks → Postgres (raw events)
Postgres → dbt models → Snowflake (analytics tables)
Snowflake → dashboard (revenue queries)

The discrepancy could be anywhere along that path.

The investigation

Step 1: Verify the discrepancy

Asked Cline to confirm the bug:

Compare daily revenue from Stripe events in our staging Postgres against the 
revenue table in Snowflake for the past 30 days. Report:
- Total revenue in Stripe events
- Total revenue in Snowflake analytics
- Daily breakdown of the discrepancy

Cline ran two queries (one against each database via MCP) and produced a comparison:

Date       | Stripe Total  | Snowflake Total | Diff $    | Diff %
2026-02-15 | $156,840.42   | $152,180.18     | -$4,660   | -2.97%
2026-02-16 | $172,103.55   | $166,891.20     | -$5,212   | -3.03%
... (28 more rows showing similar pattern)

Confirmed: roughly 3% discrepancy, consistent across days, started ~4 weeks ago.

This is the kind of step where MCP shines. Without it, I’d have written two SQL queries by hand, exported results, and joined them in a spreadsheet. With MCP, the comparison is a single Cline turn and the result is right there in the conversation.

Time so far: 5 minutes.

Step 2: Identify what’s missing

Asked Cline to find which transactions are in Stripe but not in Snowflake:

Look at Stripe events from 2026-02-15. Find any successful payment events 
(type='payment.succeeded') whose Stripe charge ID does NOT appear in 
Snowflake's revenue table for that day. Report a sample of 20 missing 
transactions.

Cline produced a list of 47 missing transactions for that day. The pattern jumped out immediately: every missing transaction had a connect_account field set, meaning these were marketplace transactions where Stripe Connect was involved.

Sample of missing transactions:
  ch_3PaAAA - amount $89.50, connect_account=acct_1MmA...
  ch_3PaBBB - amount $124.30, connect_account=acct_1MmB...
  ch_3PaCCC - amount $52.70, connect_account=acct_1MmC...
  ... (44 more, all with connect_account set)

Time so far: 15 minutes. We have a hypothesis: the dbt model isn’t ingesting Connect transactions correctly.

Step 3: Check the dbt model

Asked Cline:

@models/revenue.sql

Look at this dbt model and identify any logic that would exclude transactions 
with connect_account set. Compare the WHERE clauses to the raw Stripe events 
table.

Cline analyzed the model and found:

Line 23 of revenue.sql:
  WHERE charge_status = 'succeeded'
    AND application_fee_amount IS NULL

The application_fee_amount IS NULL clause excludes Connect transactions, which always have an application fee.

This is exactly the bug. But when did it get introduced? The dbt model was supposedly unchanged.

Step 4: Git archaeology

Asked Cline:

Use git blame on dbt/models/revenue.sql line 23. When was the 
application_fee_amount IS NULL filter added, and what was the commit message?

Cline ran git blame and reported:

Commit: 8a3f9b2
Author: [redacted]
Date:   4 weeks ago
Message: revenue: exclude test transactions

The line was added 4 weeks ago in a commit explicitly aimed at filtering 
out test transactions. The committer's intent was likely "exclude 
$0 application fees that come from test charges" but the filter is 
broader than that.

We have the cause: a commit four weeks ago added a filter intended to exclude test transactions, but the filter accidentally excluded all Connect transactions because the author didn’t realize Connect transactions always have application fees.

Time so far: 30 minutes.

Step 5: Verify the fix would work

Before fixing, I wanted to verify that removing the filter (or replacing it with a more targeted one) would produce the right numbers. Asked Cline:

Run this hypothetical: what would the revenue.sql model produce if the 
filter on line 23 were changed to:

  WHERE charge_status = 'succeeded'
    AND (application_fee_amount IS NULL OR application_fee_amount > 0)

Specifically: re-run the comparison from step 1 against the past 30 days, 
using this new filter logic.

Cline ran the corrected query against Snowflake’s source data:

Date       | Stripe Total  | Hypothetical Total | Diff
2026-02-15 | $156,840.42   | $156,832.81        | -$7.61
2026-02-16 | $172,103.55   | $172,099.20        | -$4.35
... (28 more rows showing $5-15 daily diffs)

The remaining diffs were small (matching what you’d expect from refunds and timing differences) — within normal pipeline tolerances. The fix would resolve the 3% discrepancy.

Step 6: Implement the fix

I wrote the dbt PR myself. Cline was useful for debugging but the actual fix went through normal code review channels: PR opened, reviewed by a teammate, merged, dbt run, dashboard refresh, verification.

The fix was:

-- Before:
WHERE charge_status = 'succeeded'
  AND application_fee_amount IS NULL

-- After:
WHERE charge_status = 'succeeded'
  AND (livemode = true OR test_mode = false)

The right filter for “exclude test transactions” was on the test_mode flag, not on application_fee_amount. Two-line change, one PR, deployed within an hour.

Total investigation time: about 90 minutes from “look at this bug” to “fix is in production.” Without MCP-enabled Cline, I’d have spent significant time:

Writing comparison queries by hand (probably 20-30 minutes)
Exporting and comparing results in Excel (15-30 minutes)
Manually checking dbt models (15 minutes)
Running git blame myself (5 minutes)
Iterating verification queries (30-60 minutes)

Roughly 4-6 hours, vs. 90 minutes. The MCP setup compressed almost every step of the investigation.

What MCP enabled

Specific things that wouldn’t have been possible without MCP integration:

Cross-database comparisons in one turn. Asking Cline to query both Postgres and Snowflake and compare results in a single answer. Without MCP, this is multiple manual steps.

Iterative query refinement. When the first query didn’t quite answer the question, asking “now group by connect_account” was instant. Without MCP, each iteration is a context switch to my SQL client.

Natural-language to SQL with awareness of actual schema. Cline knew the table structure because it was hooked up live. It didn’t hallucinate column names. The queries it generated ran on the first try because they used the real schema.

git operations alongside SQL. “Run git blame” and “now query the database” in the same conversation. The integration of code investigation with data investigation is what made the workflow feel natural rather than fragmented.

What MCP didn’t help with

The fix itself wasn’t an AI task. Writing the corrected dbt model required:

Knowing what we wanted (exclude test, not exclude Connect) — the AI couldn’t know our intent
Picking the right column to filter on — required reading dbt source docs
Understanding the production deploy process — required knowing our pipeline ops

For these, Cline was at most a sounding board. The actual decisions were mine.

The other thing MCP didn’t help with: the post-fix communication. Telling the data team what happened, writing a brief incident note, deciding whether to backfill historical numbers. These are human conversations that happen outside the agent.

What this changed for me

Three things shifted in my workflow after this investigation:

MCP servers are now part of my project setup. For any project with non-trivial database use, I configure read-only MCP access on day 1 of joining. The setup time pays back the first time I need to do an investigation.

I treat data investigations differently. What used to be “I’ll spend the afternoon on this” is now “I’ll spend an hour and see how far I get with Cline.” Often the answer is “all the way to a fix.” Sometimes it’s “got the diagnosis, the fix needs more thought.” Either way, the time-to-clarity is much faster.

I stopped using BI tools for ad-hoc investigations. Tools like Metabase or Hex are great for dashboards. For one-off investigation, Cline + MCP is faster because there’s no UI friction — you describe what you want, the queries appear, the results are in front of you.

What I’d warn others about

Read-only is the safe default. Configure your MCP databases for read-only access. The cost of writing the wrong DELETE through an agent is too high. There’s no reason for the agent to need write access; if you need to mutate data, you can do it yourself.

Don’t connect to production. Use staging or read replicas. Even with read-only credentials, query load on production from an unbounded agent investigation is a real risk.

Verify the schema before trusting query output. Cline’s queries are correct against the schema it knows about. If the schema is changing or the agent’s understanding is stale, queries can be syntactically right and semantically wrong.

Watch the token costs. Each MCP-mediated query call counts against context. A long investigation with many query iterations can run up costs. Worth it for hard bugs; not worth it for trivial questions you could answer with a simple SQL client.

The bigger picture

The investigation pattern I went through — observe → hypothesize → verify → fix — isn’t new. What’s new is that the verification steps, traditionally the slowest part, can now be conversational rather than manual.

The 90 minutes vs. 4-6 hours figure is striking but not the most important thing. More important: the cognitive load was lower throughout. I wasn’t switching between SQL client, terminal, code editor, and chat. The investigation felt like one continuous conversation.

For data-heavy investigations specifically, I think MCP-enabled agents are one of the most underrated productivity wins in the AI tooling space. The marketing focuses on autonomous code generation. The actually-useful pattern is “agent that can query my systems,” and it’s been hiding in plain sight.