This post continues the LLM-as-infrastructure series. The previous post covered the plumbing – running a local coding agent on consumer hardware. This one is about what you do before you let any agent touch a keyboard.
My wife decided to build her first AI application. About five minutes into the planning discussion – me trying to explain how using AI actually works – she had clearly had enough of me:
“I just want to get to the code.”
“You don’t start at the end. You start from the beginning: design. Then critique of the design. Then security and testing – all before the code. With AI, code is the end.”
She looked at me like I had just declared that dinner must be fully planned, peer-reviewed and threat-modeled before anyone gets dessert. Fair. But designing before building is literally my job description – and AI just made it everyone’s.
Why the instinct exists
For the entire history of software, code was the bottleneck. The expensive part: typing it, debugging it, getting it to compile at two in the morning. Every tutorial starts at hello world. Every gratifying moment in a developer’s early life is the moment something finally runs. Of course she wants to get to the code. The code is where the dopamine is.
That instinct was correct for thirty years. It is now exactly backwards.
For decades the cardinal sin was premature optimization – the root of all evil, Donald Knuth said, back when code was expensive and you were tempted to polish it too early. The cardinal sin of the agent era is premature code. Design, architecture and requirements come first; the code can wait. Waiting has never been cheaper.
What changed
A coding agent emits a thousand lines of plausible code in the time it takes to fetch coffee. Code is no longer scarce. What is scarce is knowing precisely what you want – because the model fills every gap you leave with a plausible default.
Plausible is the dangerous word in that sentence. Unstated requirements don’t fail loudly; they get silently replaced by whatever was statistically common in the training data. Don’t specify authentication, and you get whatever auth pattern the model felt like that day. Don’t specify input validation, and you often get none – wrapped in confident, beautifully formatted code that reads like it was reviewed by someone who cares.
The constraints have to exist before the code, because by the time the code exists, every missing constraint has already been decided for you.
Start with the vision
So where do you actually start? Not with a design document in the classical sense. You start with a vision: a short, plain-language statement of what the application is, who it serves, what “done” looks like, and – the part beginners always skip – what it must never do.
The vision is not ceremony. In agent-driven development it is the anchor artifact: the fixed point that every downstream decision, every generated line and every test result gets measured against. The agents will iterate hundreds of times. The vision is the thing that does not move while they do.
But the vision is only the first artifact. Here is the part that makes the whole thing work in practice.
Every phase is a markdown file
My workflow runs each design phase as its own document, in strict order, with an adversarial review gate between every step. Adversarial review used to cost a senior colleague’s afternoon; now it is nearly free, so every artifact gets attacked before the next one is allowed to exist.
- Design first, in a design tool. Sketch the UI, then capture the result as a file: the design itself plus a design language for the mock UI parts – colors, components, interaction patterns. The agents will need all of it later, so write it down now.
- Adversarial research of the design. Hand it to the model and tell it to attack: usability gaps, inconsistencies, the flows nobody thought through, the objections an experienced designer would raise.
- HLD – high-level design. Software implementation starts here, on paper: components, data flows, boundaries, the security model. Critiqued before anything below it exists.
- LLD – low-level design. Only after the HLD survives review. Contracts, schemas and interfaces per component. Then critique again.
- The build plan. A document that turns the surviving design into an ordered implementation plan for the agents.
- Implementation. Only now. Not a moment earlier.
Every one of those artifacts is a markdown file, and they all live in a git repository – a design repo, alongside or ahead of the code:
design/
├── VISION.md # what, who, "done", what it must never do
├── DESIGN.md # UI design + design language for the mock
├── HLD.md # components, data flows, boundaries, security
├── LLD.md # contracts, schemas, interfaces
├── TESTPLAN.md # acceptance criteria / specs
└── BUILDPLAN.md # ordered implementation planMarkdown in git is not bureaucracy. It is three problems solved at once. First, the model is stateless – it remembers nothing between sessions – so these documents are its memory: every agent re-reads them on every run. Second, markdown is diffable, so every change to the vision is a reviewable commit and the git history becomes your decision log. Third, and most importantly: everything stays revisable and reversible right up until implementation begins. Changing a sentence in HLD.md costs nothing. Changing the architecture under ten thousand generated lines costs a rewrite.
The markdown files – whether they live in the design repo or next to the code – are the durable memory. But there is a complementary layer for the in-betweens: agent-side memory. Claude Code has memory plugins, and Claude Desktop can keep cloud memory turned on – both keep the working context alive while you are inside the design loop, across sessions, so you are not re-explaining yourself every morning between critique rounds. Use them, but treat them as cache, not as source of truth. The repo is canonical; the memory is convenience. When they disagree, the markdown wins.
Make the model prove it understands
Here is the step almost everyone skips, and the one that separates a working agent pipeline from an expensive random-code generator: before any development loop starts, the AI has to demonstrate that it understands the vision and the requirements.
Not “has the documents in context” – understands them. The check is the same one you’d run on a new team member or an external contractor:
- Have it restate the vision and requirements in its own words. If the restatement drifts from what you meant, the code will drift further.
- Have it list every ambiguity and open question it can find. A model that asks no questions hasn’t understood the problem; it has just started guessing earlier.
- Resolve those questions, fold the answers back into the markdown files, commit, and repeat until the restatement matches your intent.
Only when the model can play the requirements back to you correctly does the loop get to start. Everything before this gate is cheap words. Everything after it is generated code accumulating on top of whatever understanding – or misunderstanding – you locked in here.
The loop: code agents and test agents against the vision
Now – finally – the code. But you still don’t write it, and you don’t review it line by line either. The development loop looks like this:
- Tests first, derived from the vision. The acceptance criteria in
TESTPLAN.mdbecome executable tests – written for agents to run, not for humans to admire. The security constraints live here too: “all input is validated” is a test, not a comment. - The test designer agent audits coverage. A separate agent whose only job is to verify that the tests actually cover the relevant parts: every requirement in the vision, every contract in the LLD, every must-never. Green tests that cover nothing are the amateur mistake of agent development – the suite passes, proves nothing, and the loop happily converges on garbage.
- The coding agent executes. It implements against the build plan, with the vision and constraints sitting in its context the whole time.
- The test agent verifies. A separate agent runs the tests, reports failures, and – critically – checks the result against the vision, not just against the test suite. Passing tests while drifting from intent is the failure mode to watch for.
- Loop. Code agent and test agent iterate against each other, with the vision as the referee. Your job is arbitration: when the agents disagree, or when both agree on something the vision forbids, you intervene – by amending the markdown and committing, not by hand-editing the code.
By the time generation starts, the code is almost a formality. The vision, the constraints and the tests have already decided what it looks like – which is the whole point.
Scope check: this is the fair start
A fair objection at this point – especially from the agile and DevOps crowd – is that nothing above ever touches a user. Where is the feedback loop? Where is production? The answer: this pipeline is not the whole lifecycle. It is the part that takes you from nothing to the first working mockup – a fair start, a coherent first version that holds together well enough to be put in front of reality. Once it ships, the outer loop takes over: users, telemetry, operations – and what they teach you flows back into VISION.md as commits. That half of the story is a later post in this series.
So why this much discipline just to reach a mockup? Because going too fast is exactly how the code and the UI break apart. An AI implementing without proper design parameters is not engineering – it is a bigger slot machine with more tokens. You can pull the lever as many times as you like and admire how fast the reels spin; the output is still random with respect to what you actually wanted. And the pulls are not free. As I wrote in AI Capacity Is Getting Full, compute is getting scarce and tokens are getting costly – and every parameter you leave undefined adds another row of symbols to the machine, multiplying the combinations the agents have to roll through. You pay for every spin.
Every constraint you write removes a reel from the slot machine. The fewer reels left rolling, the fewer tokens you burn before the code pays out.
Hone the parameters before the slot machine starts rolling. That is not bureaucracy – it is cost control.
This is architect work
I recognize this pipeline because it is my day job. As an architect, I don’t get judged on the lines I type; I get judged on whether the vision was right, whether the constraints were complete, and whether the people implementing it understood both before they started. Design, critique, security model, acceptance criteria – then implementation, by someone else, verified against the spec. That has always been the work.
What AI changes is who has to work this way. When the implementer is an agent that never pushes back and never asks for clarification unless forced, everyone building software inherits the architect’s job description. The typing has been automated. The judgment hasn’t.
It’s the same provisioning discipline this series keeps coming back to. You don’t rack a server and start installing packages to see what happens. You decide the role, the network position, the threat model and the backup policy – then you provision.
The joke at the end of history
We spent two decades escaping Big Design Up Front. Agile won. Waterfall became a slur. And now the machines have dragged the design phase back from the dead – with one difference that changes everything: the iteration loop is minutes, not months. Vision, design, critique, HLD, LLD, build plan, code, test, repeat. The entire waterfall fits inside an afternoon, and you can run it as many times as you like.
It isn’t waterfall. It’s waterfall with the time axis crushed flat.
And note that only half of it rose from the grave – the front half, the design half, and only until the first mockup ships. From there on, the learning still belongs to agile. The resurrection is partial. So far.
My wife has not gotten to her code yet. She is still working on her design – but now with a much better clue of how to challenge it: which questions to fire at the model, where the gaps hide, and why every reel she removes now is tokens she will not burn later. The code will be the easy part.
It always is, now.
That’s my version of the workflow – the way I work as an architect, transplanted onto agents. Yours might differ, and I’d genuinely like to see it: submit your workflows in the comments.
