Tinker AI
Read reviews

Outcome

Copilot reduced typing on routine maintenance ~30%; useless for complex Perl-specific patterns; the team still doesn't trust it for changes affecting CGI compatibility

7 min read

A consulting client maintains a Perl codebase for a niche industry. The system: about 80k lines of Perl, mostly mod_perl with some CGI scripts, dating back to 2014. The team is small (3 engineers), mostly senior. Maintenance and incremental feature work, no greenfield.

I spent six weeks helping them adopt Copilot. The findings were more nuanced than I expected.

The codebase

The state of the code:

  • mod_perl 2.x running on Apache 2.4
  • Custom ORM built before DBIx::Class was popular
  • Testing in Test::More with some Test2::V0 sprinkled in
  • Very few comments, lots of “you’ll know it when you see it” patterns
  • Some files date to 2012; some patterns from 2008 still present

Perl is a niche language. Copilot’s training data has plenty of Perl, but Perl 5.10-era idioms dominate over modern (5.36+) patterns. For a codebase that mixes both, the results are uneven.

Setup

I configured the team’s Copilot installation:

  • VS Code with the Copilot extension on each engineer’s machine
  • Workspace settings to disable Copilot in CGI scripts (more on this below)
  • Copilot Chat available

I had each engineer use Copilot for two weeks, then we’d discuss patterns. The team was AI-curious but not AI-evangelist; they wanted honest assessments.

Where Copilot won

Three categories of work where Copilot consistently helped:

Boilerplate around DBI calls. Connecting, preparing statements, executing, fetching, error handling. The patterns are well-trained in Copilot’s data. Engineers reported saving 30-40% of typing on this kind of code.

Test scaffolding. Their tests use Test::More with some Test2::V0. Copilot picks up the pattern from existing tests and generates plausible test cases. Engineers reported saving 30-50% of typing on tests.

Documentation. Writing POD (Plain Old Documentation) comments for functions. Copilot’s suggestions for POD were generally good — proper structure, sensible content.

Translating from prose to code. When an engineer described what they wanted in a comment, Copilot’s first attempt at the implementation was often close. Especially for small utility functions.

The pattern: tasks where the structure is well-known but the typing is tedious. Copilot is competent autocomplete for these.

Where Copilot lost

Several categories where Copilot was unhelpful or counterproductive:

Complex regular expressions. Perl’s regex is famously powerful and famously dense. Copilot’s suggestions for non-trivial regexes were often subtly wrong — the right idea but missing a flag, an escape, or a quantifier behavior.

Mod_perl-specific patterns. mod_perl’s request lifecycle, push handlers, custom request objects — these are niche enough that Copilot’s training data is thin. Suggestions in this area were frequently wrong.

Cross-file inheritance and composition. The codebase has classes that inherit from base classes spread across multiple files. Copilot doesn’t trace the inheritance chain reliably; it sometimes suggested method names that didn’t exist in the parent class.

Custom ORM. The codebase has its own ORM. Copilot doesn’t know it. Most ORM-related suggestions were generic DBIx::Class patterns that didn’t match. Engineers had to either reject or heavily modify these suggestions.

CGI compatibility. Some scripts run as both mod_perl handlers and standalone CGI. The compatibility shim is non-obvious. Copilot’s suggestions for code in these scripts often broke one mode or the other.

The pattern: tasks where the codebase’s specific context dominates. Copilot doesn’t know your custom code; suggestions don’t fit.

The CGI exclusion

After the second week, we disabled Copilot for files in the cgi/ directory. The reason: too many subtle compatibility issues. Engineers reported spending more time fixing Copilot’s suggestions in these files than they would have spent writing the code.

The exclusion was a workspace setting:

{
  "github.copilot.enable": {
    "*": true,
    "perl": true,
    "cgi": false
  }
}

This is a useful pattern. AI assistance is good in some contexts and counterproductive in others. The ability to scope it pays off in legacy codebases where the boundary is meaningful.

What the engineers said

After six weeks, the team’s qualitative feedback:

Engineer A (most senior): “About 30% faster on the routine stuff. Useless on the gnarly stuff. I’d keep using it but I’d never use it without reviewing.”

Engineer B (mid-career): “I notice it more when it’s wrong than when it’s right. The wrong suggestions are confidently wrong, which is annoying. But the speedup on tests is real.”

Engineer C (junior on the team): “It’s helping me ramp up on the codebase faster because Copilot Chat can explain what code does. The actual completion suggestions I’m more cautious about — I don’t always know if they fit our patterns.”

The senior’s quote captures it. Copilot is a force multiplier on familiar work. It doesn’t help with unfamiliar work, and “unfamiliar” includes “this codebase’s specific patterns.”

Productivity numbers

I tracked PR counts and PR sizes for the team across the six weeks. Compared to the prior six weeks (no Copilot):

  • PRs merged: 38 (Copilot weeks) vs 35 (no Copilot weeks). Marginal increase.
  • Average PR size: 320 lines vs 270 lines. PRs are slightly bigger.
  • Average review time: 2.4 days vs 2.6 days. Slight improvement.
  • Bugs reported in production: 4 vs 3. No meaningful change.

The numbers suggest a small productivity bump (5-15%) without measurable quality impact. This is consistent with the engineer feedback.

The honest reading: Copilot is useful but not transformative for this team. The productivity gain is real but small.

What didn’t translate from other contexts

A few things I’d seen work well in other AI tooling adoptions that didn’t apply here:

Codebase-aware features. Copilot Enterprise has codebase indexing, but the team is on Copilot Business. Without indexing, Copilot’s suggestions are based on the immediate file context only. The senior engineer noted this was a real limitation.

.copilotrules or equivalent. Copilot doesn’t yet have a project-level rules file analogous to .cursorrules. The closest is custom instructions, which apply per repository on Enterprise. The team couldn’t easily encode “we use our custom ORM, not DBIx::Class” as a project-wide rule.

Per-language excellence. Copilot is uneven across languages. The Perl experience is rougher than the team’s prior experience with Copilot on JavaScript projects.

What I’d recommend for similar projects

For teams maintaining legacy Perl (or other niche-language) codebases:

Try Copilot but expect modest gains. Don’t expect transformative productivity. 10-20% on the tasks it fits is realistic.

Scope it explicitly. Disable Copilot in directories where the codebase’s specifics dominate. Enable it in directories with more standard patterns.

Don’t trust completions for complex regex or domain-specific code. Review carefully. The model is confident; you have to be skeptical.

Use chat more than completions. The chat panel is useful for “explain this code” or “what would the typical pattern for X be?” The autocomplete is more useful for routine boilerplate.

Keep humans in the loop on safety-critical changes. For payment processing, security checks, or anything where wrong code is dangerous, manual implementation is faster than reviewing Copilot’s suggestions for subtle errors.

The honest summary

Copilot on legacy Perl: helpful but limited. The team is keeping it because the modest productivity gain justifies the modest cost. They’re not switching their workflow around it.

For teams considering AI tooling on legacy codebases in niche languages, the honest expectation is: it works, somewhat, with caveats. The marketing suggests broader value than this team experienced.

This isn’t a Copilot-specific failure. Cline, Cursor, and others would have similar patterns on this kind of codebase. The constraints are about the codebase’s niche-ness more than the tool’s capabilities.

For greenfield work in mainstream languages, AI tooling delivers more dramatically. For legacy work in niche languages, the gain is real but smaller. Calibrate accordingly.