The Agentic AI Playbook, Part 4 of 7: Spec First, Always

Part 4 of The Agentic AI Playbook

Series Navigation:

The fastest way to waste time with agentic AI is to tell it what to build without telling it what you mean.

“Build me a customer management screen.” The AI will produce something. It might look impressive. But it will be built on assumptions: the AI’s assumptions about your data model, your business rules, your UI patterns, your error handling strategy. You will spend more time correcting those assumptions than you saved by not writing the code yourself.

The fix is not a better prompt. The fix is a spec.

A spec is not a 40-page document. It is a clear, unambiguous description of what you are building — specific enough that a capable engineer could implement it without asking clarifying questions. One page of markdown, if the scope warrants it. But it must exist before implementation begins. Without it, you are not directing an engineer. You are handing one a vague goal and hoping they share your assumptions.

They do not. And neither does the AI.

Table of Contents

The seven-step lifecycle

Most teams treat AI-assisted development as: write a prompt, review the output. That is not a workflow. That is improvisation with extra steps.

The lifecycle that consistently produces good results has seven steps, in order. They apply to any meaningful feature — greenfield or brownfield, backend or frontend, one service or several.

Requirements come first. Not “users should be able to manage their orders” — that is a direction, not a requirement. A requirement is specific enough to be tested. Something like: a logged-in customer can view a paginated list of their past orders, filtered by date range and status. The list endpoint returns 20 results per page by default. Each result includes the order ID, date, status, total amount, and item count. An empty state renders when no orders match the active filters.

Notice what that includes: the endpoint, the default, the fields, the edge case. Those are the decisions you need to make before the AI makes them for you.

Once requirements are defined, brainstorm with the AI before designing anything. With requirements in hand, use it as a thinking partner, not yet an implementer. Ask it to surface edge cases. What happens when a customer has 50,000 orders? What happens to an order mid-fulfilment? What existing patterns in the codebase should this follow? This is structured thinking before the design locks in. The AI is good at it. Let it ask you questions back.

From there, produce the spec. For a backend feature, this typically means the API contract, the data model, and the key business rules. For a frontend feature, it means the component structure and the interactions. Write it down. Make it concrete. Store it in the repo. The spec is what the AI will build to. Vagueness here becomes defects later.

Reviewing the spec is the most important step and the one most often skipped. Read it carefully before asking the AI to implement anything. Look for missing error cases. Look for field definitions that could be interpreted two ways. Look for patterns that conflict with your existing codebase. The devil’s advocate technique, covered in the next section, belongs here. The principle is simple: challenge the design before it becomes code.

With a reviewed spec, ask the AI to produce a sequenced build plan. The sequence matters. Entity and migration first, then repository, then service, then controller with tests, then frontend components, then end-to-end test. Each step should produce something testable before the next one begins. If a step cannot be tested, it is probably too large.

Now — and only now — does the AI write implementation code. And it does so with the spec and plan as its guide, not its imagination. Reference both explicitly in every implementation prompt: “implement step one from the build plan, following the spec exactly.” That instruction matters. Without it, the AI reverts to its defaults. With it, it is executing a defined instruction against a defined contract.

Finally, verify against the spec. Not “does it compile” — does it satisfy every acceptance criterion? Ask the AI to flag any deviation. Ask it to explain any place where it made a judgement call that the spec did not cover. Deviations are not failures. They are information. A deviation means the spec had a gap. Fix the gap. Note the learning.

The devil’s advocate pattern

After the AI proposes a design, make it argue against itself. This should be a standing step in your workflow, not an occasional check.

The prompt is simple: find three problems with this approach and suggest how you would mitigate each one. The AI will surface risks it glossed over in its initial response. A query pattern that does not scale. A caching strategy with a stale-data risk. A component structure that makes a future requirement difficult.

You will not always act on what it finds. But you will make an informed decision rather than discovering the problem in production.

A practical example: an order notification system is designed and looks solid. The devil’s advocate prompt returns three concerns. The batch import scenario could spike requests past the email provider’s rate limit. If delivery fails, there is no dead letter queue, so events are silently lost. And the design assumes notifications mean email, which will require a refactor the moment SMS or push is added. Two of those three are worth addressing before implementation begins. Total cost: one prompt.

The pattern works because the AI optimises for producing a design that answers the question asked. The devil’s advocate prompt changes the question. Now it is looking for what it missed.

Trade-off analysis

Related to the devil’s advocate, but different in purpose. Where devil’s advocate looks for risks in the chosen approach, trade-off analysis ensures you saw the alternatives before choosing.

Add a standing rule to your CLAUDE.md: when proposing architectural decisions, always present at least two alternatives with explicit trade-offs. The format should be concrete. Option A gives this benefit but costs this. Option B gives this benefit but costs this. Then a recommendation with a reason.

This is how you catch the AI making a judgement call you disagree with before it writes 500 lines of code based on that call. The AI will make choices. The question is whether those choices are visible to you before they are implemented.

Specs use the ontology; they do not redefine it

If Part 3’s work was done correctly, your core concepts are already defined. Customer, Account, User, Identifier — their meanings, their relationships, their invariants. The spec for an order history feature references those concepts. It does not reinvent them. It does not introduce a local interpretation of what a Customer is.

This matters especially when multiple AI sessions are running in parallel on different parts of the same system. Each session has its own context window. The ontology is what keeps them aligned. Specs built on agreed concepts produce compatible artefacts. Specs that define concepts locally produce integration pain — and the AI will not warn you, because it cannot see across sessions.

When to skip it

The spec-first lifecycle is for tasks that involve design decisions. Not every task qualifies.

Skip the spec for mechanical changes: a rename across the codebase, a format fix, a dependency version bump, anything under about 20 lines that touches a single file and has exactly one reasonable implementation.

Use a spec when the task involves trade-offs, affects multiple files or layers, requires you to explain constraints or edge cases, or could reasonably be implemented in more than one way.

A practical test: if you find yourself writing a multi-paragraph prompt that explains constraints, edge cases, and expected behaviours, you need a spec. If the prompt is one sentence with a clear expected outcome, you probably do not.

Devil’s advocate

The overhead objection is real. Seven steps for something a developer used to just do sounds like ceremony.

For small features, the spec might be half a page and take fifteen minutes. The implementation prompt that follows it takes thirty seconds to write and produces output you can trust. Compare that to three cycles of prompt, review, correct, re-prompt because the AI misunderstood the scope — which is what happens without a spec.

The real question is not whether the spec takes time. It does. The question is whether the time spent on it is more or less than the time saved correcting work done without one.

There is a legitimate objection, though. The lifecycle assumes you know enough upfront to write good requirements. For genuinely exploratory work — investigating a new domain, prototyping before the requirements are clear — a rigid spec-first approach adds friction without proportionate value. In those cases, a lighter-weight spike makes sense: time-boxed, explicitly exploratory, with the expectation that the output is learning rather than production code. The spec comes after the spike, not before it.

For anything non-trivial and reasonably well-understood, the seven steps consistently produce better output than the alternative.

What to do now

Take a real feature from your current backlog — something you would normally assign to a developer.

Write the requirements as acceptance criteria. Be specific enough that a capable engineer could implement without asking clarifying questions. Then brainstorm with the AI: what did you miss, what are the edge cases, what are the performance risks?

Write the spec. One page. API contract, data model, key business rules. Then ask the AI to argue against its own design before you approve it.

Implement from the spec. Verify the output against it afterwards, line by line. Note every deviation. Each deviation is either a gap in the spec or a gap in the AI’s instruction-following. Either way, it tells you something useful.

Run the full seven steps once. The second time is faster.

Next: Part 5 — How to Work With It Daily