Training a junior engineer in a workflow that already includes AI

Published 2026-05-11 by Owner

A junior engineer hired today has almost certainly been using AI coding tools for at least a year before joining. They arrive with Copilot completions as muscle memory, a habit of pasting error messages into a chat window, and a reasonable ability to get something working on a first attempt. In many ways the entry bar for shipping has dropped. The skills gap is not what it used to be.

But there is a different gap now, and it is harder to see until the project hits turbulence.

The hidden risk

The failure mode does not announce itself. The junior ships working code, PRs look reasonable, velocity is fine. What a senior reviewer starts noticing after two or three months is subtler: the junior cannot read code they did not write without reaching for AI to explain it. They cannot debug a failure without asking the model first. They accept an abstraction the model suggested without being able to say why it is the right one.

This matters because the skills AI hides are precisely the skills required when AI fails. AI is bad at legacy codebases. It hallucinates method signatures in internal APIs. It produces plausible-looking explanations for failures that are factually wrong. A developer who has never debugged without AI assistance will not recognize when the model is confidently misleading them. They will cargo-cult the suggestion and be surprised when it does not work.

Three specific capacities get underdeveloped in AI-first junior workflows:

Reading unfamiliar code without assistance. Paste any file into an AI chat and get a summary in seconds. Useful, but summary consumption is not the same skill as reading. Reading means building a mental model of data flow, recognizing where invariants are maintained or violated, noticing where a method’s name lies about what it does. Those capacities come from slow, unassisted traversal of real code. An AI summary skips the traversal entirely.

Debugging by reasoning. AI-assisted debugging follows a pattern: copy the error, paste it in, accept the suggested fix, move on. What does not happen in that loop: forming a hypothesis about why the error occurs, checking whether the hypothesis is consistent with what other code is doing, narrowing the failure to a specific line by reasoning rather than by dice-rolling. The junior who has never built that hypothesis-and-check habit will be helpless when copy-paste debugging stops working — and it stops working on the bugs that actually matter.

Recognizing a bad abstraction. AI generates structurally plausible code: a function with a clear name, consistent argument types, a docstring. It also generates bad abstractions — ones that are too general, too specific, or that encode an assumption that will become wrong in six months. Evaluating an abstraction requires a prior sense of what good ones look like and what bad ones have cost. That prior is built through experience with consequences, not through reading model output that has no skin in the game.

There is a version of this that shows up concretely in code review. A senior asks, “why is this extracted as a separate function?” and the junior says, “the AI suggested it.” That is not an answer. It is evidence that the junior evaluated neither the suggestion nor the existing code structure. The AI made a call; the junior published it.

The opportunity that also exists

The flip side is real. AI tools provide something that was previously expensive to arrange: fast, non-judgmental feedback on small mistakes.

A junior who asks an AI why a TypeScript type error is showing up gets an answer in three seconds that might have taken a senior fifteen minutes to explain — assuming the senior was free, available, and in the mood to explain without making the junior feel bad about asking. At scale, over a week, this accelerates certain kinds of learning substantially. The junior who uses AI well will encounter more distinct error patterns in a week than one who has to queue for human review on each one.

AI is also a useful sparring partner for design questions at a cadence a senior cannot match. “What are the tradeoffs between storing this as a flat list versus a nested tree?” produces a useful response. The junior who asks this question, reads the answer, forms a view, and then defends that view in a code review is doing real thinking. The one who just picks whichever option the model recommended is outsourcing the thinking along with the work.

The opportunity: use AI to increase the volume of learning interactions, not to decrease the cognitive work per interaction. The goal is more reps, not fewer. Every time AI handles something the junior could have worked through, that is a missed rep.

There is also an exposure benefit that gets underrated. An AI-assisted junior will see more patterns — more idiomatic library usage, more standard error handling shapes, more examples of how a common problem gets structured — than a pre-AI junior on the same tasks. Pattern exposure is valuable. The catch is that exposure without comprehension does not accumulate into intuition. Seeing patterns and being able to explain them are different skills.

A three-month curriculum

The structure that has worked: start AI-off for the skills AI hides, then gradually reintroduce AI with added accountability requirements, then move to full AI workflow with diagnostic reviews.

Month one: AI off for specific exercises.

Not AI-off entirely — that is unrealistic and unhelpful. AI off for a specific set of assigned exercises, run alongside normal AI-assisted work:

Reading assignments in the existing codebase. Pick three non-trivial files per week. The junior reads them without AI assistance and writes a two-paragraph summary: what does this module do, and what would break if it were removed? The constraint forces slow traversal. Review the summaries in 1:1s, not to grade them but to understand where the junior’s mental model diverged from the actual behavior.

Debugging exercises. Take real past bugs from the issue tracker — ones with a known root cause. Give the junior the failing state and ask them to find the root cause using only the code, the logs, and a debugger. No AI. The point is not to be cruel. It is to build the hypothesis-and-check loop before the habit of asking-first takes hold permanently. A junior who has located five bugs by reasoning has an internal procedure for the sixth one. A junior who has pasted five bugs into chat does not.

Abstraction reviews. Pick three functions from the codebase that have interesting design histories — ones that were changed, refactored, or extracted from something larger. Ask the junior to explain why the current shape exists. Then explain the actual history. The gap between their answer and the real answer is a curriculum item. If they guessed right, ask why. If they guessed wrong, the conversation about why is worth more than another reading assignment.

Month one is not about withholding tools. It is about establishing that these skills exist and that the junior has them, before the AI workflow resumes at full intensity. It also gives the mentor a clear picture of where the baseline is, which month two depends on.

Month two: AI on, but require articulation.

AI tools are back in full. One additional requirement: in every PR description, the junior writes a “model contributions” section. What did AI suggest? What did they accept, and why? What did they modify or reject, and why?

This is not surveillance. It is about forcing the processing step that AI-first workflows skip. A completion accepted without evaluation is a coin flip — it might be right, but no learning happened either way. A completion that the developer has to explain in writing is one they have looked at twice.

A concrete example: the junior accepted an AI suggestion to extract a helper function for a validation rule. In the model contributions section, they wrote:

The model suggested this extraction and I accepted it because it removes duplication across two call sites. I modified the function signature — the model used a generic data parameter, I changed it to the specific type because the function only ever runs in one context and the generic version would have made future callers think it was more reusable than it is.

That is exactly the right level of reasoning. The refactoring is defensible on its own terms. The type change was an improvement on the AI output, not a passive acceptance of it. The junior can explain both the acceptance and the modification. This is what the articulation requirement produces when it works.

Not every entry will look like that. Early in month two, many will say something like “accepted AI suggestion for error handling pattern — seemed standard.” That is fine as a starting point. The conversations in 1:1s about those thin entries are where the actual curriculum happens.

Month three: full AI workflow with deep reviews.

By month three, the AI workflow runs without the articulation requirement on every PR. The accountability format has served its purpose of building a habit; the habit should now be internalized. Instead, the depth shifts to code review.

Pick two or three PRs per week for a detailed review that specifically targets the skills from month one: can the junior read the code they are changing without AI assistance, explain the existing behavior before their change, and predict what would break if a specific function were removed? These reviews are probing questions in the PR comments, not corrections. “What does this function return when the list is empty — walk me through it.” “Why does this abstraction exist at this level rather than one level up?”

The reviews are diagnostic, not adversarial. The goal is to confirm that the AI workflow has not displaced the underlying reasoning capacity — that the junior is faster because of AI, not dependent on it for things they ought to be able to do alone.

Skills AI specifically obscures in legacy codebases

Legacy work deserves special treatment because AI is particularly unreliable there and juniors are particularly likely to encounter it.

AI tools are trained on public code. Internal legacy codebases — the ones with custom ORM wrappers, internal utility libraries, decade-old abstractions that predate modern patterns — are not well represented in that training data. A model asked about an internal method will produce a confident explanation based on a similar-looking public API. The explanation will often be wrong in ways that are hard to detect without already understanding the system.

The junior who has strong unassisted reading skills will notice when the model’s explanation does not match what the code actually does. The one who relies on model explanations as a substitute for reading will not. They will take the confident wrong explanation as ground truth and debug from a false premise. Legacy bugs found by chasing AI-generated false leads are expensive bugs.

For legacy work specifically, three rules that help:

Before using AI to explain a legacy module, read the module first. Then use AI to check against the reading. The model’s explanation becomes a second opinion on a reading the junior has already formed, not a replacement for forming one.
When the model and the code disagree, the code wins. Always. The model does not know the internal codebase. The code is the ground truth.
Legacy debugging is especially poor terrain for AI assistance because the failure modes are usually in interactions between components that did not exist when the model was trained. Teach the junior to recognize when they are in that territory.

The check-in cadence

One standing question in weekly 1:1s, every week, throughout all three months and beyond:

“What did you accept from AI this week that you could not have reproduced without it? And for one of those: could you reproduce it now?”

This question does two things. First, it tracks whether the articulation habit has stuck — a junior who says “I don’t know, I just accepted whatever it produced” is telling you something important. Second, the reproduction question separates durable learning from temporary output. AI interaction that produces code without producing understanding is not accelerating development; it is borrowing against future confusion.

A good answer: “I could not have written the regex it produced, but I spent time with the regex syntax docs afterward and now I could write something equivalent.” The learning happened after the AI interaction, not instead of it. That is the pattern worth reinforcing.

A concerning answer: “I accepted a lot of the error handling patterns it suggested and honestly I am not sure why they work.” Not because the patterns are necessarily wrong, but because error handling is exactly the kind of code that needs to be understood, not cargo-culted. The error handling the junior wrote will encounter real errors, and they need to be able to reason about what happens.

The check-in also surfaces repetition. If the same category of code keeps appearing in the “could not reproduce” list over multiple weeks — always regex, always async error handling, always database query construction — that is a gap worth addressing directly. Add an explicit exercise: write three of these by hand, without AI, and then explain them. Then return to AI-assisted production. The goal is not to make writing them from scratch the permanent mode; it is to make the AI output legible when it appears.

What this is not

This is not an argument that AI tools are bad for junior development. The evidence runs the other way. A junior with good foundations who uses AI well learns faster than a junior of the same ability without AI, because the volume of feedback interactions is higher and the iteration cycle is shorter.

The argument is narrower: the curriculum has to account for what the tools hide. The skills AI assistance makes unnecessary to practice are exactly the ones that become critical when the assistance fails or misleads. A training program that ignores this produces developers who are fast until they hit the edge cases of the AI workflow — the legacy system, the internal API, the bug the model confidently explains wrong — and then surprisingly helpless.

Building the foundations first is not a tax on AI adoption. It is what makes AI adoption durable.