The Agentic AI Playbook, Part 6 of 7: Assemble a Full Engineering AI Agent Team

Part 6 of The Agentic AI Playbook

Series Navigation:

A single AI conversation works well for focused tasks: fix this bug, refactor this class, write these tests. When a feature spans multiple layers — backend, frontend, security, infrastructure, tests — a single conversation starts to fall apart.

Three things happen as scope grows. First, context saturation. By the time the AI has read your service layer, repository, entities, migration scripts, and OpenAPI spec, it has consumed a large part of its context window. Adding frontend components, security requirements, and end-to-end tests on top pushes it into territory where earlier content degrades. Quality drops silently before you notice.

Second, different disciplines need different focus. Designing a secure API requires deep attention to threat modelling and data exposure. Designing a UX flow requires attention to user journeys and edge states. Reviewing architecture for future scalability is a different mode entirely from writing the code that delivers it today. An agent trying to hold all of that simultaneously produces muddier output than one that stays in a single domain.

Third, parallelism. Most of the work on a real feature can happen concurrently once the design is locked. A single agent works in series by definition.

Multi-agent workflows solve all three. But the more important thing to understand is what they actually give you: a cross-functional team that you assemble, brief, and direct — one with specialists who most teams cannot afford to have in every room.

Table of Contents

Two phases, two teams

The most effective multi-agent model runs in two distinct phases: design first, then implementation. The teams are different for each.

This mirrors how good engineering actually works. The people who design the system should not be simultaneously writing the code. The people reviewing the architecture for security risks should not also be the ones implementing the feature. Separation of concerns applies to teams, not just code.

The design team

Before a single line of implementation code is written, assemble a design team. Its job is to produce a specification so precise that the implementation team can build from it without making design decisions of their own.

The design team might include an architect agent to define the overall structure and integration patterns, a backend expert to validate the service and data model design, a frontend expert to define the component structure and API consumption patterns, an API designer to produce the contract with concrete request and response examples, and a UX agent to define the user journeys, edge states, and empty states that the frontend will need to handle.

Add a security agent to this phase, not the implementation phase. Security reviewed after the fact means retrofitting controls into a design that was not built for them. Security reviewed during design means the threat model informs the contract before the contract becomes code.

Then add two devil’s advocate agents. Their sole job is to find problems with everything the other agents produce. One reviews the architecture: where will this fail at scale, what has been assumed that should not be assumed, what will be painful to change in six months? The other reviews the plans: where is the spec ambiguous, where have requirements been interpreted rather than defined, what does the implementation team need that is not in the document?

The design team debates, challenges, and refines until the output is something every member can agree is buildable. You approve the final spec before implementation begins.

The implementation team

With a locked spec, the implementation team takes over. These agents build and validate. They do not make design decisions. If they encounter something the spec does not cover, they stop and flag it — they do not infer.

A backend team handles the service layer, data model, and API implementation. A frontend team handles the component tree, state management, and API integration. A dedicated API spec writer maintains the OpenAPI contract as the living source of truth, updating it if implementation reveals a discrepancy with the draft spec. QA agents write and run tests at each milestone — unit tests during implementation, integration tests at each milestone boundary, end-to-end tests in real browsers against real services.

Add a pentesting agent to the implementation team. As each endpoint and authentication flow comes online, this agent probes it: injection attempts, access control boundary testing, token handling, session behaviour. Security testing done at the point of implementation is dramatically cheaper than security testing done before a production release.

A DevOps agent handles infrastructure concerns: containerisation, environment configuration, CI pipeline integration, deployment scripts. And a review agent runs after every milestone, checking implementation against the spec, flagging deviations, and verifying that each acceptance criterion is met before the team moves on.

What this actually gives you

Stepping back from the mechanics: what you have just assembled is a cross-functional software development team. Architect, backend engineers, frontend engineers, API designer, UX designer, security reviewer, penetration tester, QA engineers, DevOps engineer, and a dedicated devil’s advocate function — all working on your feature, simultaneously.

Most teams do not have all of these people available for any given feature. Security reviews happen at the end of a sprint, if they happen at all. Architecture review is a calendar invite that gets rescheduled. Penetration testing is a quarterly engagement with an external firm. UX input is captured in a Jira ticket from a conversation that happened two weeks ago.

The multi-agent model makes all of those disciplines present in the room, working on the feature, in real time. That is not a marginal productivity improvement. It is a structural change in how software gets designed and built.

The spec is still the glue

With more agents, the stakes on the spec get higher, not lower.

Agents do not share conversation history. They share files. The spec directory is the only thing keeping ten specialists aligned. The API contract, data model, UI component interfaces, acceptance criteria, security requirements, and infrastructure constraints all live there. Every agent reads the relevant files before starting work. Every agent has a standing rule: if implementation would deviate from the spec, stop and flag it. Do not silently adapt.

This is why everything in the earlier parts of this series matters more in multi-agent work, not less. The ontology from Part 3 defines what concepts mean across the entire system. The spec from Part 4 defines what each layer must produce. When ten agents are working simultaneously from those artefacts, the gaps in the preparation become visible immediately. A vague concept does not propagate through one agent’s session — it propagates through ten, each resolving the ambiguity with a different inference, producing inconsistencies that are distributed across the entire codebase before anyone reviews anything.

When the spec changes, every agent must re-sync. If one agent discovers mid-implementation that the spec has a gap and quietly fixes it locally, the others are still building against the original. The cost of stopping to update the spec is minutes. The cost of silent divergence across a ten-agent team is a day of debugging integration failures.

Milestone-based validation

The implementation team should never run unchecked. Define milestones before implementation begins, each one a small, verifiable integration checkpoint. The backend team implements the first endpoint; the QA and pentesting agents verify and probe it; the review agent checks it against the spec. Commit. Then continue.

Stop, verify, commit. Then continue.

The review agent is critical at each checkpoint. It is not the same as the devil’s advocate agents from the design phase. The design devil’s advocates challenge plans before they become code. The review agent checks code against the plan that was approved. They are different questions: “should we build this?” versus “did we build what we said we would?”

Milestone commits are also rollback points. If milestone four fails, you return to the milestone three commit. You do not debug backwards through uncommitted changes across ten concurrent workstreams.

How it works mechanically

Claude Code’s agent teams feature handles the coordination. You describe the team structure you want to the lead agent in plain language — describe the design team composition, the roles, and the spec file they should work from — and it spawns teammates within the same session. Teammates share a task list, can communicate directly with each other, and report back to the lead. You remain in the lead session, directing the team and approving outputs at each stage.

Claude-flow provides an alternative orchestration layer for teams that want more programmatic control — scripted dispatch, automated milestone triggers, and structured handoffs between the design and implementation phases.

Each agent gets the same project context: your CLAUDE.md, your ontology document, and the shared spec. File ownership is defined in the spec. The backend team owns the service layer. The frontend team owns the component tree. The API spec writer owns the contract file. The DevOps agent owns the infrastructure configuration. No agent touches another’s territory without explicit permission. If a shared file needs updating, one designated agent makes the change and the rest re-read it.

The failure modes worth knowing

Most multi-agent problems are predictable. The most common is spec ambiguity reaching the implementation team. If the design team produced a spec with prose descriptions rather than concrete examples, the implementation team will guess. Two agents guessing differently about the same field produce incompatible implementations that both compile. The fix is concrete examples in every spec: write the exact JSON response body, not a description of what it contains.

The second is a mid-flight spec change that only reaches some agents. The rule is absolute: if any agent identifies that the spec needs to change, all agents stop until the spec is updated, re-reviewed, and approved. No exceptions.

The third is the design phase being cut short under delivery pressure. Skipping or rushing the design team means the implementation team encounters design questions mid-build. They will make decisions — they have to — and those decisions will be made by agents optimising for completion rather than by specialists optimising for correctness. The time saved in the design phase is rarely less than the time lost correcting implementation decisions that should not have been implementation decisions.

Devil’s advocate

The model assumes you have the preparation in place to brief ten agents coherently. You need a well-defined ontology, a clear set of requirements, and enough understanding of the domain to evaluate what the design team produces. A team of specialist agents amplifies your preparation the same way it amplifies your ambiguity — at scale.

There is also a real cost dimension. Ten active agent sessions consume significantly more tokens than one. The design team phase in particular, with agents debating and challenging each other, can be expensive relative to a single architect working through the same questions. For a small feature with well-understood requirements and low integration risk, a single well-briefed agent may be the right tool. The cross-functional team model earns its cost on complex features where the design mistakes it prevents are more expensive than the tokens it consumes.

And the devil’s advocate function needs to be taken seriously, not treated as a formality. If the review agents are too agreeable, they add noise without value. Brief them explicitly to find problems: “Your job is to identify what is wrong with this design, not to validate it.” The design phase is only as strong as the quality of its challenge.

What to do

Take your next significant feature — something with enough scope that a single agent would struggle with the full breadth of it.

Write the requirements and acceptance criteria before assembling any team. Be specific enough that a designer could work from them without asking for clarification.

Then describe your design team to the lead agent. Each role is a sentence: “spawn an architect, a backend expert, a frontend expert, an API designer, a UX agent, a security agent, and two devil’s advocates; their job is to produce a spec for the feature in requirements.md.” The lead handles the rest. Review what they produce. Challenge anything that looks like a gap.

When the spec is approved, describe the implementation team the same way: a sentence naming the agents and their territories, a sentence pointing them at the spec, a sentence defining the milestone structure. Let them build.

You are no longer the engineer writing every line. You are the person who assembled the team, defined the brief, and is responsible for the outcome. That is a different job. It is, if anything, closer to the work that actually shapes a system.

Next: Part 7 — What Goes Wrong