Aider on embedded C firmware: a story of patience and skepticism

I worked on a small embedded firmware project using aider as my main AI assistant. The project: control firmware for an industrial sensor, running on an STM32 microcontroller. Bare-metal C, no RTOS. About 4000 lines of code over six weeks.

The honest summary: AI tools are a poor fit for this kind of work. Some specific tasks benefited; the overall workflow didn’t change much. Here’s the breakdown.

What makes embedded different

Embedded firmware development has constraints that AI agents don’t handle well:

No standard runtime. There’s no Node.js, no Python interpreter, no JVM. The “stdlib” is whatever the chip vendor provides plus what you write. Standard libraries the model knows are mostly irrelevant.

Tight resource constraints. 64KB of RAM, 256KB of flash. Allocation patterns matter. The model’s defaults (allocate freely, don’t worry about it) are wrong here.

Real-time requirements. Some code must execute within microsecond-level deadlines. The model doesn’t reason about timing.

Hardware register access. Memory-mapped I/O via volatile pointers. The patterns are specific to each chip family. The model knows some chip families well and others poorly.

Limited tooling for the AI loop. The model can’t run the firmware. It can’t see actual hardware behavior. Test feedback comes from “I flashed it and it works” or “I flashed it and it doesn’t” — neither informative for the agent.

Vendor-specific ecosystem. STM32CubeIDE, ChibiOS, Zephyr, NuttX — these are real ecosystems but small relative to mainstream development. Training data is thin.

Where aider helped

Despite the constraints, three categories of work benefited:

Boilerplate around peripheral initialization. Setting up GPIOs, configuring timers, enabling clocks. The patterns are repetitive within a chip family. Aider could produce reasonable scaffolding once it had a reference implementation.

Comments and documentation. Adding doxygen-style comments to functions, writing module-level docs. Aider was competent here.

Test fixtures. I had a host-side test harness that exercised the firmware logic separately from the hardware. Aider was reasonable at writing tests against this harness.

Driver porting. I needed to port a small library from one chip family to another. Aider could mechanically translate register accesses, leaving me to verify correctness.

Where aider failed

Several categories where aider produced wrong or unusable suggestions:

Interrupt handlers. Code in ISRs has constraints (no allocation, no blocking, minimal time). Aider’s suggestions ignored these constraints. About 80% of ISR-related suggestions were wrong.

Memory management. I’d ask for help with a buffer allocation strategy; aider would suggest dynamic allocation. On a system with 64KB of RAM and no malloc, this is a non-starter. Required heavy correction every time.

Register configurations. When the model tried to write register accesses, the bits were often wrong. The chip’s reference manual is the source of truth; the model’s memory of the manual is approximate.

Timing-sensitive code. Bit-banging protocols, signal generation with specific timing — the model’s suggestions were structurally fine but timing-wrong.

Power management. Sleep modes, wake-up sources, peripheral clocking during sleep. The model’s defaults were “always on,” which is wrong for battery-powered devices.

The agent loop problem

Aider’s autonomous loop assumes feedback from running code. On embedded firmware, “running the code” means flashing the chip, observing behavior on real hardware. This isn’t something aider can do.

For the agent loop to work, you’d need:

A simulator that the model can run
Or a hardware-in-the-loop test rig
Or extensive host-side testable code

I had option 3 (host-side tests) for the application logic. For hardware-specific code, none of these were available. The agent loop just couldn’t proceed.

This is a structural limitation. Even with a much smarter model, the agent loop can’t progress without feedback. Embedded development’s feedback loop runs through hardware that the agent can’t access.

What I ended up doing

After the first week of frustration, I shifted strategies:

Manual implementation for hardware-specific code. ISRs, register accesses, timing-critical code — I wrote by hand. Aider stayed out.
Aider for application logic on the host-testable side. State machines, data parsing, communication protocols — these I could test on the host, so the agent loop worked.
Aider for documentation and tests. These are language tasks, not domain tasks. Aider was fine.
Aider for asking questions, not writing code. “What’s the typical pattern for X on STM32?” got useful answers when used as research, even when I couldn’t trust the produced code.

This split worked. The 15% of code aider contributed was useful. The other 85% was mine.

Productivity numbers

Time spent: 6 weeks (estimated 5; ran over by a week due to a hardware bug, not AI-related).

Aider cost: ~$22 in API tokens.

Estimated time without aider: 6.5 weeks. So aider saved about half a week.

Compared to my TypeScript projects where aider saves 30-40%, this is meaningfully smaller. The constraints of embedded work compress the productivity gain.

What other embedded engineers have told me

I’ve talked to several other embedded engineers about AI tooling. The pattern:

Most have tried it
Most have ended up where I did: useful for some narrow tasks, useless for the bulk of the work
Few have had transformative experiences
The notable exception: engineers working on application-layer code in C++ for embedded Linux systems (less constrained, more standard) have better luck

The pattern fits. The more your “embedded” work resembles standard development (Linux kernel, application code on rich operating systems, etc.), the more AI tools help. The more your work is bare-metal, register-level, real-time, the less they help.

What might change this

A few hypothetical developments that would shift the picture:

Better simulators integrated with AI tools. If aider could drive QEMU or Renode for testing, the agent loop could close on application logic. The technology exists; the integration doesn’t.

Models trained on chip reference manuals. If a vendor (or open-source community) trained a model specifically on a chip family’s documentation, register-level code generation could become reliable. None exist yet that I know of.

Better hardware-aware reasoning. Models that understand timing constraints, power constraints, memory constraints. Current models don’t reason about these well.

These are not in the near term. The gap will likely persist for a while.

For folks doing similar work:

Don’t expect AI to write your firmware. It mostly can’t, and what it does write needs heavy review.

Use AI for the parts that resemble normal development. Application logic, host-side tools, tests, docs.

Use AI as a research aid. “What’s the typical pattern for SPI on this chip family?” — aider can suggest. You verify against the reference manual.

Don’t trust register-level code from the model. Always verify against the documentation.

Use AI for code review on your own changes. “Here’s the ISR I wrote; does it have any issues?” — aider can sometimes catch mistakes you missed, even if it can’t write the code in the first place.

The honest summary

Embedded firmware is one of the niches where AI tooling hasn’t yet delivered. The gap is structural. For now, the engineering judgment, the manual reading, the careful integration with hardware — these remain firmly human work.

This will probably change over the next several years. For now, embedded engineers should calibrate expectations: AI is a small productivity gain on specific tasks, not a transformation of the workflow.

For students and engineers entering the field, this is actually good news. Embedded remains a domain where the human skill matters most. The barrier to entry is real; the AI shortcuts that exist for web development don’t exist here. The work is still about understanding hardware, protocols, and physics — not just typing speed.