A client had ~200 bash scripts accumulated over a decade of operational work. The scripts handled deploys, backups, monitoring, log rotation, the usual operational concerns. Many were written in 2015-style bash (no shellcheck, inconsistent quoting, ad-hoc error handling). I modernized them using aider over three weeks.
The interesting finding: bash is one of those niches where AI tools are particularly effective despite the niche-ness. The patterns are well-known; the variance per script is low; the linter catches the model’s mistakes.
The starting state
The scripts:
- Mix of shell varieties: bash, dash, ksh references in a few
- Inconsistent style: some with
set -e, some without - Variable quoting: about 40% had unquoted variables (a bug source)
- Error handling: mostly missing
- Almost no tests
- Documentation: variable; some had comments, most didn’t
A typical script:
#!/bin/bash
DEST=$1
DAYS=$2
find /var/log -mtime +$DAYS -name "*.log" -exec gzip {} \;
mv /var/log/*.gz $DEST
echo "done"
Subtle issues: unquoted $1 and $2, no error checking on the find/mv, no validation that DEST exists, no —help, etc. Run it with a directory that has spaces in the name; bad things happen.
The modernization plan
For each script:
- Run shellcheck; capture issues
- Add proper shebang and
set -euo pipefail - Quote all variables
- Add input validation
- Add error handling
- Add a brief usage block
- Write a basic test using bats
Roughly 20-30 minutes of work per script if done manually. For 200 scripts, that’s 70-100 hours.
The aider workflow
The pattern that emerged:
# In a terminal:
shellcheck script.sh > script.sh.lint
# In aider:
> /add script.sh
> /add script.sh.lint
> /add EXAMPLES/modernized_template.sh # a reference I'd written by hand
> modernize this script following the template. Address all the
> shellcheck issues. Add proper input validation, error handling,
> and a usage block. Preserve the script's actual behavior — don't
> add new features.
Aider produced a modernized version. I’d review for:
- Same behavior (no new features, no behavior changes)
- Shellcheck-clean
- Style consistency with the template
- Reasonable error handling
About 8 minutes per script with aider. Across 200 scripts: ~25 hours instead of 70-100. Real productivity gain.
What aider got right
For most scripts, aider produced output that:
- Passed shellcheck
- Followed the template I’d shown
- Quoted variables consistently
- Added reasonable input validation
- Wrote useful usage blocks
About 80% of scripts were merge-ready after one aider iteration plus a 2-minute review.
What aider got wrong
The 20% that needed iteration:
Behavior changes that looked plausible. Aider sometimes “improved” scripts in ways that changed behavior. For example, a script that ran in the foreground became one that backgrounds itself. The new version was “better” in some sense but broke whatever was calling the script expecting foreground behavior.
The fix: explicit “do not change behavior” in the prompt.
Loss of working idioms. Some scripts had unusual but working patterns specific to the operational context. Aider sometimes “cleaned them up” into more standard patterns that didn’t work in the actual environment.
The fix: explicit “preserve unusual constructs” with a list of things to look for.
Input validation that was too permissive. Aider sometimes added input validation that allowed invalid inputs through. For example, validating “must be a number” without checking range.
The fix: I’d add specific bounds in the prompt for scripts that needed them.
Tests that didn’t exercise actual behavior. The bats tests aider generated were sometimes structurally OK but tested obvious cases instead of the cases that mattered. I’d refine after the fact.
These were manageable. The pattern of “iterate when needed” added time but kept the overall productivity gain meaningful.
A specific success
A script that handled rotating PostgreSQL backups had years of organic complexity. Reading it took me an hour. Aider’s modernization addressed all shellcheck issues, added proper error handling, kept the unusual but working logic intact.
The original was 180 lines, partially documented, sometimes inconsistent. The modernized version was 240 lines, fully documented, consistent style. Same behavior. Fewer ways to break.
This is the kind of work that’s easy to put off because it’s tedious. AI tools make it tractable. The scripts that nobody wanted to touch get modernized; the codebase becomes safer to maintain.
What I’d recommend
For teams with collections of legacy scripts (bash, Python utilities, PowerShell, etc.) that need modernization:
Build a template first. Modernize one script by hand carefully. This is the template aider follows for the rest.
Use the linter. Whatever linter exists for your language (shellcheck for bash, ruff for Python, etc.). Run before and after; verify cleaner output.
Be explicit about preserving behavior. The default for AI tools is “improve things.” For modernization, the goal is “preserve behavior, improve quality.” The prompt has to be explicit.
Test critical scripts manually. For the highest-stakes scripts, run them in a sandbox after modernization. Verify behavior. Don’t trust AI-modernized scripts in production without verification.
Batch by similarity. Scripts that follow similar patterns can be modernized in similar ways. Group them; refine the prompt for each group.
The cost
Aider API spend: $19 across 3 weeks of intermittent work.
Time invested: ~30 hours.
Compared to manual estimate of 70-100 hours, the savings is ~40-70 hours. At consultant rates, that’s substantial.
Why bash worked well
Reflecting on why this domain fit AI tools so well:
Bash patterns are well-documented. The shellcheck team has thoroughly documented common issues. The model has strong training on these patterns.
Per-script variance is low. Once you’ve seen 20 bash scripts, you’ve seen the patterns. AI scales the patterns efficiently.
Linters catch most mistakes. shellcheck catches roughly 95% of the issues AI tools introduce. The feedback loop closes.
The work is mechanical. Modernization isn’t creative; it’s pattern application. AI excels at pattern application.
Stakes are bounded. Scripts are usually single-purpose; bugs are usually obvious; recovery is usually fast. The blast radius of mistakes is contained.
These properties make bash modernization an unusually good AI tool fit. Other niches with similar properties (small Python scripts, PowerShell modules, simple SQL transformations) likely have similar productivity profiles.
What this taught me
The big lesson: AI tooling productivity isn’t always proportional to language popularity. Bash isn’t a “popular” coding language but it works well with AI tools. The factors that matter are linter quality, pattern consistency, and stakes — not how much TypeScript/Python content the model has seen.
For teams with operational tooling (bash, makefiles, dockerfiles, kubernetes YAML, terraform), AI tooling is potentially as productive as for application code. The marketing emphasizes application development; the actual productivity may be more uniform.
If you’ve been avoiding AI tools for operational work because “it’s not a real language,” reconsider. The productivity gain may be similar or higher than your application work.
Closing
The 200-script modernization is one of the more concrete productivity wins I’ve had with AI tools. The work was tedious enough that nobody wanted to do it; AI made it tractable; now it’s done. The compound benefit of less buggy operational tooling will pay off for years.
For teams sitting on similar piles of legacy operational scripts: this kind of project is a good candidate. The investment to learn the workflow is small; the payoff is substantial.