Avoiding agentic drift in large codebases

Abstract stippled forms on black — agentic drift

coding agents usually do their best to preserve the existing contract.

to avoid breaking changes, they add normalization layers, duplicate ownership paths, and other small compatibility fixes.

That helps in the moment, but it does not age well.

As the codebase grows, our goal is a pluggable, scalable architecture, not a linear codebase held together by patches. this behavior adds too much noise, makes the system messy, makes testing harder, and over time causes everything to drift…

codebase drift means the intended architecture slowly loses its shape because small fixes, duplicate paths, and compatibility layers accumulate over time.

and even when you hard cut old contracts or refactor them out, you still end up with guards to make sure they never come back, or with empty stubs left behind.

the good thing is software devs can use their knowledge and apply the skills they built over years. the bad thing is this is where the real work starts.

skill issue!

probably but… look there is a difference if you one shot a simple webapp out in the wild or doing domain work on a codebase that should survive when winter is coming.

what we’re talking about is keeping layers intact, and if you refactor it should be done clean without introducing more guards and code.

however here are some patterns i gathered over time to tackle this issues.

set the rails first

Stippled railroad tracks on black — keeping agents on rails

congrats, you got promoted to team lead overnight.

once you use coding agents regularly, you are basically working in a team now.

and like any team, it needs rules, boundaries, ownership, checks, and some shared idea of what good work looks like. it does not matter if the other contributor is human or agent. if those things are weak, the codebase starts to drift.

the better and stricter these rules are defined, the easier it gets later. because less ambiguity means less room for random paths, duplicate ownership, and patchwork fixes.

everytime i start a new codebase i set up the same boring fundamentals first.

house rules

define the rules, the checks, the boundaries, and the expected shape of the project. for agents this becomes even more essential, because if the codebase does not communicate its constraints clearly, the agent will still move forward, just often on the wrong path.

linter, formatter, typechecker (oxlint, eslint, rustfmt, …)
test setup (vite, nextest, …)
secrets + guards (gitleaks, lefthook, …)
release process (ci, tags, changelog, …)
for dead code hygiene i can recommend knip

if its about frontend tooling recommend to take a look at vite plus which gives you the full package.

check the map

if you work on a language or stack that is new to you then research it first. open gpt and let it collect best practices and coding rules for the version you’re actually using. then cut that summary down and turn it into rules for your AGENTS.md. (we talk about AGENTS.md below)

map the domains

folder structure is the third thing i would plan early. if you do not give the agent a clear layout it will start to put things where they fit in the moment and not where they belong long term.

split by domain early and make ownership visible in the tree. this makes it much easier to tell what belongs to core, what belongs to cli, what belongs to web, and where tests should live.

and while we’re at cli i would recommend building a small cli in most codebases. agents understand clis very well and it gives them a clear contract to work with when they need to run the important paths of your system.

domain matters because it gives every part of the codebase a clear home. without that, agents start to duplicate logic, blur ownership, and push features into whatever folder feels convenient in the moment.

for a modern typescript setup something like this is already enough:

apps/
  web/
    src/
    test/
packages/
  cli/
    src/
    test/
  core/
    src/
    test/
  shared/
    src/
    test/

it does not need to be perfect on day one. it just needs to be clear enough that the agent does not start building new rooms in random corners of the house.

a small task tracking system

there are dozens of memory & task based solutions out there. you can try and use them on your behalf but often its just enough to have a small plan -> code -> review loop which is tracked in a json, md file or a small sqlite database.

what matters is not building some huge project management layer for your agents. most of these systems get bloated fast and coding agents are already very good at context gathering. but even a very small task tracker makes sense because it reduces drift while the agent works.

a simple task system i build for my projects is codex-planr. it just tracks my tasks in a json file and has three skills (plan, fix and review).

working on the same codebase with multiple agents

how can people work with 6 agents at the same time? there are cases where it works good and where it doesn’t. a lot of it also depends on my mental state. yes there are days where i can easily drives 4 agents in parallel and there are days where i spend hours in one session and abandon others or pick them up later.

the hard part is avoiding overwritten changes and even more drift. if you know your codebase and your domains, it gets much easier. you can let one agent work on an isolated backend feature while you shape some frontend details yourself. in that case you are delegating parts of your thinking, not your ownership.

worktrees or same branch. official codex workflows recommend worktrees when you really want independent parallel tasks, and that makes sense. they are the safe option when you want separation. i just do not reach for that by default. most of the time i avoid parallel branches and worktrees unless the split is worth it. if i do need it, i usually use a separate checkout on disk that i call codex-sandbox.

the reason is simple. imagine building a car from scratch. if you have 10 copies of that car, you need to move between each clone, add some parts, and later merge everything back into one car. i would rather build on one car and rebuild a small part when i mess up.

to summarize, parallel branching and merging is just no fun in agentic coding. that does not mean i never use branches, i just avoid parallel branches and worktrees unless they are really necessary.

send out scouts - subagents

subagents get some hate out there and honestly that criticism is fair. they can burn through tokens fast, and if you already need to supervise 6 agents by hand, letting one agent orchestrate even more sounds a bit insane. still, i like them, especially to explore my codebase for relevant context. that is what i use for things like duplicate ownership or unnecessary fallbacks.

you can tell codex or cursor something like:

spawn n=5 subagents explore and $find-duplicate-ownership

long term these subagents are getting more efficient. you can already feel that shift in codex, where stronger models hand off exploration work to smaller models like gpt-5.4 mini.

if you want to explore subagents more directly, i can recommend codex swarms by am.will.

coding agents already spend most of their time in the terminal anyway. they run commands, read logs, patch files, chain scripts, and move through the project that way.

i don’t want to go too deep into the mcp vs cli debate here. already when i built browser-echo i ended up in the same place. for many tasks the agent just needs to run something, read the logs, see the result, and try again. the CLI is still the fastest and most direct place for that. mcp gets more interesting once you want a cleaner protocol around approved tools, authenticated sessions, or a handoff from your live browser into the agent. that is where the extra layer starts to pay off. i still would not start there by default.

the more important part is giving the agent eyes. if it can only read the codebase it will guess too much. the goal is that the agent is able to see what you see. let it see the running app, read browser logs, inspect state, click through flows and take screenshots. that is why these tools matter so much now. dev-browser, peekaboo or the recent Chrome DevTools MCP update all push in the same direction. let the agent work with the live app and not only with static files.

My goal is always to give the agent access to the same view and context that I have.

AGENTS.md - tell the agent how you work

when cursor released rules i don’t know how many hours i burned writing and testing them so my coding agents would pick up at least some part of my style. this was before we wrapped everything in words like harness and context engineering. i learned a lot from it, but i also wasted a lot of time because those files got just most of the time ignored.

Todays models are much better at instruction following (especially GPT/Codex models).

Keep AGENTS.md short and generic. put in the things that should hold on every run, how to build, how to test, what must not be touched, what done means, and maybe a few repo rules the agent keeps getting wrong.

short and accurate beats long and vague, and if the agent makes the same mistake twice, that is usually the moment to write the rule down

one such AGENTS.md you can find in my codex-1up repo.

game of skills

AGENTS.md tells the agent how i work in general. skills are where i store the more specific patterns i want it to reuse.

a skill is basically a reusable playbook for one kind of job. usually it is a SKILL.md file, sometimes with a few scripts or references around it. not just a long prompt, but a stored workflow with some opinion in it. it tells the agent how to approach that kind of task without starting from zero every time.

they replace a lot of repeated prompting and make the agent much more consistent across runs. personally i use them more like slash commands. i call them directly with $code-review or $find-duplicate-ownership. but most coding agents predict when a skill fits and pull it in at the right moment.

$architecture ownership

in larger codebase you usually have a canonical path for a specific feature. most of these codebases are splitted through domains. for example core/sdk, api, daemon, database, ui …

i built this skill to declare the canonical owner of a feature. once that owner is clear, codex/claude stops creating second homes for the same feature and stops placing it in a different place which it isn’t made for.

in worst case you end up with the same types and features in the frontend and backend (which drifts easily after the first refactor).

$root-cause finder

If water is dripping from a ceiling, a hotfix is placing a bucket underneath it. the floor stays dry for now, but the real problem is still a broken pipe hidden behind the wall.

the root cause finder is useful if you want to avoid fixing only the narrow path where the bug appears. it pushes the agent to trace the first wrong layer instead of patching the edges again and again.

your coding agents likes to apply a quick fix that makes the immediate issue disappear, but the real cause is frequently deeper in the architecture. If that deeper layer is not addressed, the problem usually comes back in another form. That’s why architecture ownership (see above) matters in the first place.

$find-duplicate-ownership

when there’s no single source of truth.

this skill is useful when no single source of truth exists and ownership has become duplicated or redundant. duplicate ownership is where drift hides best. on the surface it looks like one feature, but under it two different places already think they are in charge.

I run it from time to time with subagents to spot overlapping ownership, find duplicate owners, and see where responsibility is unclear. That makes it easier to assign clear ownership again.

$hard-cut

this is the brother of find duplicate ownership. it will hard cut existing dual ownerships and tells the coding agent that you want to remove the old path instead of keeping both alive. as long as old and new paths stay in the codebase, the agent will keep patching both and drift keeps growing.

you can find the skills in instructa/agent-skills under skills/.

keep the human in the loop - manual review

2026 models and coding harness is much better now. codex can sit on harder tasks for a long time, cursor is comfortable with ambitious multi-file work and test loops, and claude code is clearly moving in the same direction too, even if most runs are still short in practice.

and that is exactly why the loop gets stronger. once the first run looks good, you throw another review pass at the diff. and then another one. and another one. the agent will always find one more thing it can add, refactor, or “make better”.

manual review is still the best review. not one hour later when the diff is already spread across the repo and you barely remember why a file changed. review it early.

and no, manual review does not mean reading every line again or taking the keyboard back to rewrite the code yourself. usually it means scanning the diff for strange patterns. too much normalization, too many fallbacks, hardcoded values, weak typing, or logic growing in the wrong place. once i see that, i pull the agent back in and ask better questions. “what is the right long term architecture here?” “how can we make this more type safe?” the point is simple. someone still has to lead.

and trust me, there is almost nothing more tempting than using the agent like a slot machine in a casino, starting one run after another and waiting for the next payoff. the dopamine is real, and once you fall into that loop it is hard to get out again.

‘but i’m running ralph loops all day and i’m crushing it.’

well yes, maybe. but as i said in the beginning, i’m not talking about a linear codebase that one-shots an app into existence. i’m talking about a scalable, maintainable codebase that should stay in your hands, not drift into the agent’s hands.

today’s models already have most of what we need to produce very good code, and the harness around them is getting better fast. the harder part now is deciding what the loop is allowed to do, when enough is enough, reviewing early, and keeping the shape of the system intact.

that is how you avoid agentic drift in a large codebase.