Code Is No Longer the Bottleneck

Code is no longer the bottleneck — understanding is.

This post continues the LLM-as-infrastructure series. The previous post covered the plumbing – running a local coding agent on consumer hardware. Before that, AI Capacity Is Getting Full covered why every spin of the AI roulette has a price – which is about to become very relevant to design and architecture. This one is about what you do before you let any agent touch a keyboard.

My wife decided to build her first AI application. About five minutes into the planning discussion – me trying to explain how using AI actually works – she had clearly had enough of me:

Wife: “I just want to get to the code.”

Me: “You don’t start at the end. You start from the beginning: design. Then critique of the design. Then security and testing – all before the code. With AI, code is the end.”

She looked at me like I had just declared that dinner must be fully planned, peer-reviewed and threat-modeled before anyone gets to eat. Fair. But designing before building is part of my job description – and AI just made it part of everyone’s.

Table of Contents

Why the instinct exists

For the entire history of software, code was the bottleneck. The expensive part: typing it, debugging it, getting it to compile at two in the morning. Every tutorial starts at hello world. Every gratifying moment in a developer’s early life is the moment something finally runs. Of course she wants to get to the code. The code is where the dopamine is.

That instinct was correct for thirty years. It is now exactly backwards.

What changed

A coding agent emits a thousand lines of plausible code in the time it takes to fetch coffee. Code is no longer scarce. What is scarce is knowing precisely what you want – because the model fills every gap you leave with a plausible default.

Plausible is the dangerous word in that sentence. Unstated requirements don’t fail loudly; they get silently replaced by whatever was statistically common in the training data. Don’t specify authentication, and you get whatever auth pattern the model felt like that day. Don’t specify input validation, and you often get none – wrapped in confident, beautifully formatted code that reads like it was reviewed by someone who cares.

The constraints have to exist before the code, because by the time the code exists, every missing constraint is a dice roll.

For decades the cardinal sin was premature optimization – the root of all evil, Donald Knuth said, back when code was expensive and you were tempted to polish it too early. The cardinal sin of the agent era is premature code. Design, architecture and requirements come first; the code can wait. Waiting has never been cheaper.

One clarification before the discipline scares anyone off: this is not “define everything up front, then build”. You are not writing a 200-page spec and freezing it. Every document here gets revised dozens of times, and the loop reruns the whole sequence – vision, design, critique, code, test – in minutes. What comes first is not everything; it is the constraints – and even those stay liquid right up until implementation. The order is fixed. The contents are not.

Start with the vision

So where do you actually start? Not with a design document in the classical sense. You start with a vision: a short, plain-language statement of what the application is, who it serves, what “done” looks like, and – the part beginners always skip – what it must never do.

The vision is not ceremony. In agent-driven development it is the anchor artifact: the fixed point that every downstream decision, every generated line and every test result gets measured against. The agents will iterate hundreds of times. The vision is the thing that does not move while they do.

Not move while they iterate, that is. Between iterations the vision is as revisable as everything else in the repo – but only by commit, never by an agent’s improvisation mid-loop. Frozen inside the loop, versioned between loops. (And yes – an adversarial review of this very post would have caught that the previous paragraph said “does not move” without the qualifier. The gate works on prose too 😉

But the vision is only the first artifact. Here is the part that makes the whole thing work in practice.

Every phase is a markdown file

My workflow runs each design phase as its own document, in strict order, with an adversarial review gate between every step. Adversarial review used to cost a senior colleague’s afternoon; now it is nearly free, so every artifact gets attacked before the next one is allowed to exist.

Design first, in a design tool. Sketch the UI, then capture the result as a file: the design itself plus a design language for the mock UI parts – colors, components, interaction patterns. The agents will need all of it later, so write it down now.
Adversarial research of the design. Hand it to the model and tell it to attack: usability gaps, inconsistencies, the flows nobody thought through, the objections an experienced designer would raise.
HLD – high-level design. Software implementation starts here, on paper: components, data flows, boundaries, the security model. Critiqued before anything below it exists.
LLD – low-level design. Only after the HLD survives review. Contracts, schemas and interfaces per component. Then critique again.
The test plan. TESTPLAN.md: acceptance criteria and specs – what “done” has to prove, the security must-nevers included. Written here, as design, so the build plan has verifiable targets to aim at and the loop has something real to test against. Critiqued like everything else.
The build plan. A document that turns the surviving design into an ordered implementation plan for the agents.
Implementation. Only now – and only once the model has proven it understands the spec (the sections below). Not a moment earlier.

Every one of those artifacts is a markdown file, and they all live in a git repository – a design repo, alongside or ahead of the code:

design/
├── VISION.md           # what, who, "done", what it must never do
├── DESIGN.md           # UI design + design language for the mock
├── HLD.md              # components, data flows, boundaries, security
├── LLD.md              # contracts, schemas, interfaces
├── TESTPLAN.md         # acceptance criteria / specs
└── BUILDPLAN.md        # ordered implementation plan

Markdown in git is not bureaucracy. It is three problems solved at once. First, the model is stateless – it remembers nothing between sessions – so these documents are its memory. Splitting them by phase is the point: each agent re-reads only the documents its task needs, not the whole design on every run, which keeps the context lean and the token bill down. Second, markdown is diffable, so every change to the vision is a reviewable commit and the git history becomes your decision log. Third, and most importantly: everything stays revisable and reversible right up until implementation begins. Changing a sentence in HLD.md costs (next to) nothing. Changing the architecture after ten thousand generated code lines costs a rewrite.

The markdown files – whether they live in the design repo or next to the code – are the durable memory. Agent-side memory is a useful layer on top: Claude Code has memory plugins, and Claude Desktop can keep cloud memory turned on, both holding the working context across sessions so you are not re-explaining yourself every morning between critique rounds. But treat that memory as cache, never as the source of truth. The repo is canonical; when the two disagree, the markdown wins.

Make the model prove it understands

Here is the step almost everyone skips, and the one that separates a working agent pipeline from an expensive random-code generator: before any development loop starts, the AI has to demonstrate that it understands the vision and the requirements.

Not “has the documents in context” – understands them. The check is the same one you’d run on a new team member or an external contractor:

Have it restate the vision and requirements in its own words. If the restatement drifts from what you meant, the code will drift further.
Have it list every ambiguity and open question it can find. A model that asks no questions hasn’t understood the problem; it has just started guessing earlier.
Resolve those questions, fold the answers back into the markdown files, commit, and repeat until the restatement matches your intent.

Only when the model can play the requirements back to you correctly does the loop get to start. Everything before this gate is cheap words. Everything after it is generated code accumulating on top of whatever understanding – or misunderstanding – you locked in here.

One caveat, because this gate is softer than it looks: a model can restate your vision fluently and still hold the wrong model of the problem. Treat the restatement as triage, not proof. The actual proof comes one step later, when the requirements get compiled into executable tests – understanding that survives compilation into assertions is verified; understanding that survives only paraphrase is rhetoric.

The loop: code agents and test agents against the vision

Now – finally – the code. But you still don’t write it, and you don’t review it line by line either. The development loop looks like this:

Tests first, derived from VISION.md and TESTPLAN.md. The acceptance criteria in TESTPLAN.md become executable tests – written for agents to run, not for humans to admire. The security must-nevers from VISION.md live here too: “all input is validated” is a test, not a comment.
The test designer agent audits coverage. A separate agent whose only job is to verify the tests actually cover the relevant parts: every requirement in VISION.md, every boundary and security rule in HLD.md, every contract in LLD.md, every must-never. Green tests that cover nothing are the amateur mistake of agent development – the suite passes, proves nothing, and the loop happily converges on garbage.
The coding agent executes. It implements against BUILDPLAN.md, with VISION.md, HLD.md and LLD.md sitting in its context the whole time.
The test agent verifies. A separate agent runs the tests, reports failures, and – critically – checks the result against VISION.md, not just against the test suite. Passing tests while drifting from intent is the failure mode to watch for.
Independent review – from outside the model. Every agent above is an LLM, and an LLM grading an LLM shares its blind spots. Before a block counts as done, someone who did not write it signs off: a human, or at minimum a separately-owned, separately-prompted review pass with its own context. This is the check for the failure the others structurally can’t catch – the block that passes every test and still solved the wrong problem. It costs real time and money, and that is the point: it is the one line item the agent era makes more important, not less, because the cheaper code is to generate, the more of it arrives unread.
Document and update the markdown. Once the block is green, fold the outcome back into the design repo: check the finished items off in BUILDPLAN.md, record the test results, and note any decisions or deviations the build surfaced. The markdown stays the live record of what exists versus what is still only designed.

You run this per feature or block, not across the whole app in one pass. Within a block, steps 3 and 4 iterate – coding agent and test agent against each other, VISION.md as the referee – until the tests pass and the result still matches intent; step 5 is the independent sign-off, and step 6 records it before you move to the next block. Your job is arbitration: when the agents disagree, or when both agree on something VISION.md forbids, you intervene by amending the markdown and committing, not by hand-editing the code.

Be honest about what the intent check in step 4 is: another LLM making a judgment call, with the same plausibility failure modes as the coding agent it polices. It works exactly as well as the must-nevers in VISION.md are concretely worded – vague intent produces vague verification. The quality of step 4 was decided back when you wrote the vision. That is one more reason the vision is not ceremony.

Once the loop is running, the code stops being the hard part. The vision, the constraints and the tests have already decided most of what it looks like – the loop exists to grind out the rest. The wheel still spins; you have just removed enough reels that it can only land on variants of what you asked for.

One layer the agents do not replace: the deterministic floor. Linters, type checkers, security scanners, real CI – boring, rule-based tools that don’t hallucinate and don’t get persuaded. The test agent is a probabilistic reviewer sitting on top of them, never instead of them. Same principle as the rest of this series: the monitoring stack doesn’t go away because you hired a clever operator. When an agent and the linter disagree, the linter wins.

Scope check: this is the fair start

A fair objection at this point – especially from the agile and DevOps crowd – is that nothing above ever touches a user. Where is the feedback loop? Where is production? The answer: this pipeline is not the whole lifecycle. It is the part that takes you from nothing to the first working mockup – a fair start, a coherent first version that holds together well enough to be put in front of reality. Once it ships, the outer loop takes over: users, telemetry, operations – and what they teach you flows back into VISION.md as commits. That half of the story might be worthy of a followup post – if I ever get to it.

So why this much discipline just to reach a mockup? Because going too fast is exactly how the code and the UI break apart. An AI implementing without proper design parameters is not engineering – it is a bigger slot machine with more tokens. You can pull the lever as many times as you like and admire how fast the reels spin; the output is still random with respect to what you actually wanted. And the pulls are not free. As I wrote in AI Capacity Is Getting Full, compute is getting scarce and tokens are getting costly – and every parameter you leave undefined adds another row of symbols to the machine, multiplying the combinations the agents have to roll through. You pay for every spin.

A glowing green blueprint schematic wired into a dark slot machine with geometric symbols on its reels, a hand inserting an AI token coin — Every undefined parameter is another reel. You pay for every spin.

Every constraint you write removes a reel from the slot machine. The fewer reels left rolling, the fewer tokens you burn before the code pays out.

Hone the parameters before the slot machine starts rolling. That is not bureaucracy – it is cost control.

And the bill doesn’t stop when the reels do. The spin is only the first invoice; the second one recurs. Every thousand lines the machine pays out is a thousand lines someone has to run, monitor, patch and eventually understand – and AI doesn’t shrink that complexity, it relocates it, from the typing you no longer do to the operating you still will. Code no human wrote by hand is still code a human has to own. The same undefined parameters that pad your token bill pad this one too: looser constraints, more code, more surface, more to keep alive. Honing the vision is cost control on both axes – the spin you pay for now and the upkeep you pay every month after. The deep version of that second axis – infrastructure, operations, the tech debt of code no author ever read – sits outside the scope of this post; the point here is only that the discipline that cheapens generation is the same one that keeps ownership from ballooning.

AI writes the code; it doesn’t decide its shape – you do, in HLD.md and LLD.md. Bounded components behind clean interfaces are cheap on both axes: cheap to generate (one piece, one contract, not the whole system at once) and cheap to replace (delete a module, regenerate it against the same contract, nothing else notices). The monolith loses both ways. The skill that saves you in the agent era turns out to be the oldest one in the book: modularity, good class design – the seams you draw on purpose, before a line exists.

A scoping rule before anyone copies the six-file layout into a bugfix branch: the pipeline scales with blast radius, not with enthusiasm. A full application gets all six files. A single feature might need only VISION.md and TESTPLAN.md. A two-line fix needs neither – the existing design repo is its context. Over-specifying has its own token bill: every page of markdown rides along in the agent’s context on every iteration. Remove reels, yes – but don’t bolt extra reels on first just to remove them ceremonially.

Where this fits – and where it doesn’t

The pipeline scales with blast radius, not enthusiasm – and blast radius isn’t only size, it’s kind. The same six files do not belong everywhere:

New product / enterprise system. Home turf. High blast radius, long life, multiple owners – the full pipeline earns every file.
Startup / MVP. Use the spine, skip the ceremony. VISION.md and TESTPLAN.md, maybe HLD.md. Speed is the product before product-market fit; a six-file gate this early is process for its own sake.
Legacy maintenance. The design repo already exists – in the code, the tests, the scars. Reverse-engineering a full VISION.md before touching anything is rarely realistic. Here the discipline is reading the constraints that exist, not authoring new ones.
Incident fix. Vision-first is actively wrong. Production is down; the loop is stop-the-bleeding first, reconstruct intent afterward in the postmortem. The only artifacts that matter in the moment are the fix and the test that proves it.

The rule underneath all four: the heavier the future you are committing to, the more of the pipeline you run. A throwaway gets none of it. A system three teams will maintain for five years gets all of it.

This is architect work

I recognize this pipeline because it is my day job. The architect’s real product isn’t code, and it isn’t a stack of documents either – it’s the decisions the developer or AI has to obey: the vision, the boundaries, the security model, the definition of done. Get those right and complete, make sure whoever builds it actually understands them, and the implementation mostly follows. Get them wrong and no amount of clean code saves you. That has always been the work – the typing is just the last mile.

What AI changes is who has to work this way. When the implementer is an agent that never pushes back and never asks for clarification unless forced, everyone building software inherits the architect’s job description. The typing has been automated. The judgment hasn’t.

It’s the same provisioning discipline this series keeps coming back to. You don’t rack a server and start installing packages to see what happens. You decide the role, the network position, the threat model and the backup policy – then you provision.

Who approves, and who owns the risk

A pipeline full of agents still needs a human name attached to three decisions, or it quietly becomes nobody’s fault:

Who approves the design. Someone signs off that VISION.md, HLD.md and the must-nevers are right before the loop spends a token. That is the architect’s call – accountable, not advisory.
Who reviews the output. Not “did the tests go green” – an LLM wrote and graded those. A named human owns the judgment that the result is fit to ship, with the independent review above feeding into it.
Who owns the risk. When the shipped thing breaks at 3am, the agent does not get paged. Decide before you build whose decision it was. Agents are accountable to no one; the moment you forget that, so is the failure.

The joke at the end of history

We spent two decades escaping Big Design Up Front. Agile won. Waterfall became a slur. And now the machines have dragged the design phase back from the dead – with one difference that changes everything: the iteration loop is minutes, not months. Vision, design, critique, HLD, LLD, build plan, code, test, repeat. The entire waterfall fits inside an afternoon, and you can run it as many times as you like.

It isn’t waterfall. It’s waterfall with the time axis crushed flat.

And note that only half of it rose from the grave – the front half, the design half, and only until the first mockup ships. From there on, the learning still belongs to agile. The resurrection is partial. So far.

My wife has not gotten to her code yet. She is still working on her design – but now with a much better clue of how to challenge it: which questions to fire at the model, where the gaps hide, and why every reel she removes now is tokens she will not burn later. The code will be just the last mile.

In the AI age, it always is.

One honest caveat to close on: everything above is architecture and theory – the workflow as designed, the clean version on the whiteboard. Reality is messier, and bending it to fit an actual codebase, team and deadline is a different job entirely. That one I do for a fee. 😉

This is my version of the workflow – the way I work. Yours might differ, and I’d genuinely like to see how it differs: submit your workflows or feedback in the comments.

"Code Is the End: Vision-First Development with AI Agents"