Cline on an Airflow data pipeline: agentic ETL development

I built a customer data pipeline using Apache Airflow with Cline as my main AI assistant. The pipeline: pulls customer events from Postgres, transforms via dbt, lands in Snowflake, generates reports. About 2500 lines of Python, ~20 DAGs, plus dbt models.

Cline was a strong fit for this work. The structured nature of Airflow DAGs and dbt models — well-defined inputs, outputs, and patterns — gave Cline clear targets to hit. The autonomous loop worked well because each task was bounded.

The setup

Tools and stack:

Airflow 2.9 deployed on MWAA (managed Airflow on AWS)
Postgres source database (read replica for the pipeline)
Snowflake warehouse
dbt Cloud for transformations
Cline 3.5 with Claude 3.5 Sonnet
Standard Python tooling: black, ruff, mypy, pytest

The repo structure:

.
├── dags/
│   ├── customer_events_dag.py
│   ├── billing_dag.py
│   └── ... (about 20 DAGs)
├── plugins/
│   ├── operators/
│   │   └── postgres_to_snowflake.py
│   └── hooks/
├── dbt/
│   ├── models/
│   └── tests/
└── tests/
    └── dags/

I had a single previously-built DAG as a reference (customer_events_dag.py). The pattern was clear; the rest was repetition.

.clinerules

The rules file I wrote at the start:

This is an Airflow 2.9 / dbt project. Conventions:

DAG patterns:
- Each DAG is a separate file in dags/
- Use the @dag decorator and @task decorator (TaskFlow API)
- Standard schedule: daily at 02:00 UTC
- Standard retries: 2 with 5-minute delay
- Standard tags: ['domain', 'tier'] where domain is one of {customer, billing, ops, analytics} and tier is one of {bronze, silver, gold}

Task patterns:
- Each task does one thing; if a task has more than ~30 lines, split it
- All tasks have docstrings explaining inputs and outputs
- Use the existing PostgresToSnowflakeOperator from plugins/ for source-to-warehouse moves
- Do not write inline SQL in tasks; SQL goes in dbt models

dbt patterns:
- Models in dbt/models/{domain}/{tier}/ matching the DAG tags
- Each model has a schema.yml with column tests
- Source declarations in sources.yml; never reference tables directly in SQL
- Use ref() for cross-model dependencies

Testing:
- Each DAG has a structural test in tests/dags/test_<dag_name>.py
- Structural tests verify: DAG loads, expected task structure, schedule
- Logic tests use Airflow's TaskInstance to mock; live tests run weekly

Avoid:
- Writing custom operators when an Airflow built-in or our existing custom one works
- Inline SQL in tasks (use dbt)
- Hardcoded table names (use the connection's database/schema config)
- Cross-DAG dependencies via TriggerDagRunOperator (we use SLA + sensors)

This was about 50 lines of rules. The cost was 30 minutes of writing. The payback was massive — Cline produced consistent code from day one.

Week 1: scaffolding the DAGs

I started by listing all 20 DAGs I needed: their source tables, their target tables, their domains, their tiers. I gave the list to Cline and asked it to scaffold each DAG following the reference pattern.

Cline produced 20 DAGs in about 4 hours of session time. Each DAG followed the conventions:

Right structure
Right tags
Right schedule
Used the existing operators

About 80% of the scaffolds were merge-ready as-is. The other 20% needed adjustment for domain-specific quirks.

This was the fastest part of the project. Cline excels at “produce N files following pattern X.” Airflow DAGs fit this perfectly.

Week 2: dbt models

dbt models followed a similar pattern. Cline produced:

The base SQL for each transformation
The schema.yml with column tests
Source references

For dbt specifically, Cline had a useful capability: it could generate column tests based on inferred semantics (“this column looks like it should never be null and unique”). About 60% of generated tests were keepers. The rest I removed or adjusted.

Cost so far: about $35 in API tokens. Productivity: estimated 60% time saving compared to manual implementation.

Week 3: the harder DAGs

Some DAGs had logic that didn’t fit the standard pattern:

One DAG depends on five upstream DAGs and triggers based on all of them
One DAG has incremental loading with a complex high-water-mark logic
One DAG has reverse-ETL behavior, pushing computed values back to operational systems

For these, Cline’s first attempts were less successful. The patterns were less clear; the model couldn’t pattern-match as effectively.

I shifted strategy for these:

Ask Cline to plan the DAG (Plan mode)
Review and adjust the plan with Cline
Generate the implementation in pieces
Test each piece before moving on

This was more handholding than the simple DAGs but produced correct results. Each hard DAG took 90-120 minutes including the plan; manual implementation would have been 3-4 hours.

Cline’s specific failure modes

Some Cline-specific issues I hit:

Forgetting to use the existing operator. I had a custom PostgresToSnowflakeOperator. Cline sometimes used a generic PostgresOperator + SnowflakeOperator instead. Adding “use the existing PostgresToSnowflakeOperator from plugins/” to the prompt fixed this case-by-case.

Wrong schedule conventions. Despite the rule, Cline occasionally generated schedules that didn’t match (cron strings instead of timedelta, wrong start dates). About 20% of the time I had to correct.

Hard-coded table names. Despite the rule, Cline sometimes generated hardcoded database.schema.table references. Strict review caught these but they were a recurring annoyance.

Test patterns drift. The structural tests Cline generated were good initially. As the project grew, Cline would sometimes generate slightly different structural test patterns that diverged from the original. I’d notice in review and align.

These are the kinds of things a sharper .clinerules file would catch. The rules I wrote at the start covered most cases but not all. I’d update them throughout the project.

What worked very well

A few patterns that worked particularly well for ETL development:

The “scaffold many similar things” pattern. Cline is excellent at “produce 20 DAGs each following this pattern but with different sources.” This is the bulk of ETL work.

The “generate tests based on the DAG” pattern. Once the DAG was right, Cline could generate matching tests easily.

The “run dbt and explain failures” pattern. dbt errors are often opaque. Cline could read the dbt log, identify the issue, and propose a fix. Saved real time on debugging.

The “convert Airflow 1.x patterns to TaskFlow API” pattern. The team had some old Airflow 1.x DAGs they wanted modernized. Cline did this competently — most of the conversion was mechanical, and Cline handled it.

What didn’t work

A few things to flag:

Performance optimization. The pipeline had some slow tasks. Cline’s suggestions for optimization were often theoretical (apply parallelism here, partition the data there) without grounding in the actual bottleneck. Profiling-and-fixing was manual.

Connection management. Airflow’s connection model is its own thing. Cline’s suggestions for connection setup were sometimes generic Python patterns rather than Airflow’s BaseHook patterns. I had to nudge.

MWAA-specific quirks. AWS’s managed Airflow has specific limitations (no shell access on workers, requirements.txt size limits, etc.). Cline didn’t know about these; I caught issues in review.

Cross-DAG architecture. Decisions about whether to put logic in one big DAG vs. split into smaller DAGs are architectural. Cline could implement either; deciding which was right was on me.

The token cost

Total Cline API spend: $61 over 2.5 weeks.

For comparison:

My freelance rate: $X/hour
Time saved: ~80 hours
Value: $80X

The $61 is essentially nothing compared to the time saved. This is consistent with most AI tooling cost analysis: the API costs are tiny compared to the productivity gains.

Recommendation

For teams building ETL pipelines or similar structured data engineering work, AI tooling is a strong fit. Specifically:

Cline (or Aider) for autonomous DAG generation and modification
The architect/editor split (Cline 3.5+) for the harder DAGs
A solid .clinerules file from day one
One reference implementation that the model can pattern-match against

The productivity gain is real and measurable. ETL has the right shape — clear inputs, clear outputs, patterns that compose. AI agents handle this work well.

The areas where AI is less helpful (architecture decisions, performance optimization, infrastructure-specific quirks) are the same areas where AI is less helpful in any context. They’re engineering tasks, not pattern-matching tasks. Plan for human attention there.

Closing observation

The project was estimated at 5 weeks; it took 2.5. The time saved came from AI tooling, but it also came from the project being a good fit for AI tooling. Not every project is. ETL with consistent patterns is one of the best fits I’ve worked with.

If you’re picking projects for AI experimentation, structured pattern-heavy work like ETL is a high-leverage place to start. The wins are visible; the risks are bounded; the team’s confidence in AI tooling builds quickly.