Part 3 – Inside an AI-First Pod: How Four People Out-Deliver a Team of Twelve

Series Navigation:
This is Part 3 of a 5-part series on Agentic AI Architecture.

Part 1: Communication and AI
Part 2: AI-First Delivery Model
Part 3: Inside an AI-First Pod (you are here)
Part 4: Architecture Roles
Part 5: Talent & Culture

You don’t believe it until you watch it happen.

Four people. One requirement. Four hours later, it’s in production.

Not a prototype. Not “mostly done pending review.” Actually live. Customers using it. Support team trained. Documentation published.

The first time you see it, you assume they cut corners. So you check:

Test coverage? 94%, including edge cases.
Error handling? Comprehensive.
Observability? Metrics, logs, traces, alerts all configured.
Security review? Passed automated policy checks.

They didn’t cut corners. They just removed the waiting.

Here’s exactly how it works, including the parts where it goes wrong.

Monday morning, 9:00 AM. The product lead drops a requirement into the pod’s channel:

“Customer success is getting hammered. Customers want to pause subscriptions for vacation or financial reasons, then auto-resume. They’re calling support instead because we don’t have this feature. Need: Pause for up to 90 days, auto-resume with all settings intact, customer can cancel the pause early if they want.”

By 1:00 PM the same day, the feature is live in production.

Who’s actually in the pod

As we saw in Part 2, pods eliminate handoffs by containing every role needed to deliver end-to-end. Here’s who actually makes up a pod:

Four people, occasionally five:

1. Solution Architect (often called Tech Lead in some orgs)

This person owns the technical narrative. They understand domain modeling, can spot where abstractions will leak, and know what good system design looks like. They don’t draw architecture diagrams in isolation – they work with AI to model domains, explore options, and validate constraints in real-time.

Key skill: Translating messy business reality into clean domain models that AI can work with.

What they’re not: Someone who dictates implementation details or draws boxes-and-arrows that developers follow blindly. They set constraints and patterns; the AI and engineers handle implementation.

2. Product/Domain Lead (Product Manager/Owner)

The person who understands what customers actually need and why. They shape priorities, clarify requirements, and make rapid trade-off decisions. They don’t write detailed specifications – they articulate intent as structured prompts that AI can turn into working prototypes within minutes.

Key skill: Clear, unambiguous articulation of outcomes and acceptance criteria.

What they’re not: Someone who writes user stories for other people to implement later. They work with the pod in real-time, steering as the AI generates options.

3. Engineers (usually two, sometimes one)

AI-accelerated developers who work with the agent to generate, review, and refine code. They’re not writing boilerplate or CRUD operations manually – the AI does that. They’re applying judgment: Is this implementation correct? Does it handle edge cases? Will this design cause problems in six months?

Key skill: Code review at speed. Treating AI like a very fast junior developer who’s occasionally confidently wrong.

What they’re not: Typists who turn specifications into code. They’re decision-makers who use AI to explore options rapidly, then choose the right one.

4. Quality Strategist (not traditional QA)

Someone who thinks about risk, failure modes, and non-functional requirements. They don’t write test cases manually – they define test strategies and risk profiles that AI translates into comprehensive test suites. They’re the team’s adversarial thinker: “What could go wrong? What are we not considering?”

Key skill: Risk modeling and systems thinking applied to quality.

What they’re not: Manual testers or test case writers. They define what needs testing; AI generates the tests.

Optional 5th: Pod Lead

In larger organisations or more complex domains, someone may handle sequencing, dependency management, and impediment removal. This isn’t a project manager role – more like a facilitator who keeps the pod flowing. Many pods don’t need this; the four core members self-coordinate.

What the pod doesn’t need (and why that matters)

No dependency on:

Platform team for infrastructure or deployment
Data team for schema changes
Security team for auth patterns (follows encoded guardrails)
Architecture review board for approval
Separate QA team for testing
Ops team for monitoring setup

The pod has access to self-service platforms, pre-approved patterns, and automated pipelines. Within defined guardrails, they make decisions and move.

This is what “cross-functional” actually means. Not “we have different roles in the same room who still depend on other teams.” But “we have everything we need to deliver end-to-end.”

Monday morning: The subscription-pause feature

Let’s watch the journey from requirement to production. This pod will ship before lunch, but not by cutting corners. By working in parallel, using AI effectively, and eliminating waiting.

9:00-9:45 AM: Solution Design

The solution architect and product lead pair on the solution design. The architect opens Claude and starts.

Remember Part 1‘s core message: communication is your job, not the AI’s. The architect knows this. They’re about to translate messy business reality into clear, structured prompts that AI can work with.

As we learned in Part 2, the pod doesn’t jump straight to code. They start with solution design.

Step 1: Solution Design

The architect gives the AI agent a detailed prompt with full context, instructing it to act as a senior solution architect:

Prompt 1:We need to add subscription pause/resume capability. Context: Current state: Subscriptions have states: active, cancelled, expired. When active, customers are billed monthly and have full feature access. New requirement: Allow customers to pause for 1-90 days. During pause: - No billing - Limited feature access (can view data, can't create new) - After pause period, auto-resume with original settings - Customer can manually resume early - Customer can cancel permanently during pause Act as a senior solution architect. Generate a comprehensive solution design document including: - Requirements (functional and non-functional) - NFRs (Non-Functional Requirements) covering performance, security, scalability, observability - Application architecture diagrams - Data flows showing how information moves through the system - Data models defining entities, relationships, and constraints - State machine model for subscription states - Edge cases and failure modes

AI responds with a comprehensive solution design document. It also identifies clarifying questions:

What happens if billing fails on resume?
Can customers extend their pause period?
What if customer’s payment method expires during pause?
Do we notify them before auto-resume?

This is exactly what the pod needs – the AI isn’t just answering the prompt, it’s thinking through implications.

Prompt 2 (refining):Good questions. Decisions: - If billing fails on resume: Move to "payment_required" state, customer can update payment and retry. After 7 days, cancel subscription. - No extending pause period. Customer must resume then pause again. - If payment method expires during pause: Send notification, require update before resume. - Notify 3 days before auto-resume. Regenerate the solution design document with these decisions incorporated. Include: 1. Updated state machine with these rules 2. Event schema for state transitions 3. API contract for pause/resume operations 4. Database schema additions needed 5. Integration points with billing service

AI produces the complete solution design document. The team reviews it together. They correct assumptions, identify gaps, and make changes. They feed this feedback back to the AI, which refines the design. They iterate until everyone is satisfied that the design is sound.

The architect shares the final design in the pod channel: “Solution design complete. Review while we generate the build plan.”

Elapsed time: 45 minutes. In a traditional setting, this would be hours of diagramming followed by a review meeting days later.

9:45-10:05 AM: Build Plan

Once they’re happy with the solution design, they ask the AI to generate a detailed build plan. This follows the process from Part 2.

Prompt 3:Generate a detailed build plan for the subscription pause/resume feature. Reference: - The solution design document [pastes design doc] - NFRs from the design (performance, security, scalability, observability) - Our organisational guardrails and standards [loads from prompt library] - Coding standards and patterns [loads from prompt library] - Definition of done criteria [loads from prompt library] - Testing requirements [loads from prompt library] Break the work into logical increments with: - Clear dependencies - Acceptance criteria for each increment - Estimated effort - Risk assessment

The AI spends 5-10 minutes generating this detailed build plan, breaking the work into logical increments with clear dependencies and acceptance criteria.

The team reviews the build plan. They adjust priorities, clarify dependencies, and ensure it aligns with their guardrails. They make changes and iterate until the plan is actionable.

Elapsed time: 20 minutes. The build plan is ready. Now they can move to implementation with confidence.

10:05-10:50 AM: Implementation

While the build plan was being finalised, one engineer started reviewing the solution design. Now they’re ready to code.

Only when they’re satisfied with both the solution design and the build plan do they hand off to the AI coding agent to implement the solution. The coding agent has everything it needs: clear requirements, architectural guidance, and a step-by-step plan.

The engineer checks with the product lead: “The API contract shows pause duration as 1-90 days. Should we validate this in the API or in the UI?” Product lead responds: “Validate in both – API rejects invalid values, UI prevents invalid input. This is a business rule, not just UX.”

The engineer also confirms with the solution architect: “For the state machine, should we use an event store or just update the subscription table directly?” Solution architect: “Use event store – we’ll need audit trail for billing disputes. Follow our existing event-sourced pattern.”

One engineer takes the build plan, API contract, and event schema from the solution design. They have a prompt library – reusable prompt templates that encode the pod’s patterns and standards.

Engineer’s prompt:Implement the subscription-pause service following our build plan [references build plan]. Use our event-sourced pattern [loads template from prompt library]. From the solution design: - API contract: [pastes AI-generated contract] - Event schema: [pastes AI-generated schema] - State machine: [references design doc] Include: - OpenAPI spec - Command handlers for pause/resume/cancel - Event publishers - State validation - Error handling for invalid transitions - Integration with billing-service (use our event bus) - Observability (metrics, logs, traces) per NFRs Use TypeScript, follows our coding standards [loads standards from library]. Include comprehensive unit tests.

AI generates ~800 lines of code, tests, and OpenAPI spec in 45 seconds.

The engineer stares at the screen for a moment. This would have taken me half a day. And it looks… actually pretty good. But then the engineer’s experience kicks in. The logic is mostly right, but the error handling for concurrent state changes is naive. It assumes optimistic locking but doesn’t implement it.

Refining prompt:The state transition handling doesn't prevent race conditions. Two simultaneous requests could both succeed, putting the subscription in an invalid state. Add optimistic locking using version field on subscription entity. Include tests for concurrent modifications.

AI regenerates the relevant sections. Better. The engineer commits the code.

Before committing, the engineer shares the implementation with the solution architect for a quick review. Solution architect confirms: “The optimistic locking approach looks good. One thing – make sure the version field is included in the event payload so we can replay state correctly.” Engineer adds that, commits.

The engineer also pings the product lead: “Implementation is ready. The pause/resume API matches the contract. Want to see a quick demo before we move to testing?” Product lead: “Looks good from the API spec. Let’s verify the UX flows in staging.”

Elapsed time: 45 minutes for implementation (including review and refinement). A human would spend 4-6 hours writing this from scratch, and probably miss the concurrency issue until it happened in production.

This is why the solution design and build plan steps matter. By the time the engineer starts implementation, the AI has clear context about what to build, how to build it, and what constraints to follow. The implementation is faster and more correct because the thinking happened first. But communication doesn’t stop at handoff – the engineer, solution architect, and product lead stay in sync throughout implementation, catching issues early while context is fresh.

10:30-11:00 AM: Test strategy

While the engineer finalises implementation (which continues until 10:50), the quality strategist starts generating the test strategy. This parallel work is how pods compress timelines – people work simultaneously on different aspects of the same feature. The quality strategist reviewed the solution design earlier and has been thinking about test scenarios, so they’re ready to start generating tests now.

The quality strategist reviews what’s been built so far. The unit tests are solid, but there are gaps in the test strategy.

QA’s prompt to AI:We have a subscription pause/resume feature. Unit tests exist. What integration and edge-case tests are needed? Consider: - Clock drift and timing issues (pause period expiration) - Billing service failures during resume - Customer cancels during pause - Customer deleted before resume - Payment method expired - Database connection failures during state transition - Event bus unavailable - Multiple rapid pause/resume requests Generate a test plan, then generate the actual test implementations. Use our testing framework [loads template].

AI produces a test plan identifying 23 scenarios, including several the team hadn’t considered:

What if customer pauses on day 29 of their billing cycle?
What if subscription was already scheduled for cancellation?
What if feature flags disable pause during their pause period?

The quality strategist’s eyes widen. We would have missed at least five of these. The QA reviews the plan, removes three scenarios that don’t apply to their context, adds one about regulatory data retention, then asks AI to generate the test implementations.

Takes 30 minutes including iteration. Traditionally, writing this test suite would take two days, and would likely miss half these edge cases.

11:00-11:30 AM: Local verification

The second engineer runs the full test suite locally. 87 tests. Three failures:

The billing integration test fails because the local billing-service mock doesn’t support the new events yet.
One edge case test (customer deleted before resume) fails because the implementation returns 404 instead of 410 as per REST standards.
Performance test shows the state transition query is doing a table scan.

Fixes:

Issue 1: Update the billing-service mock to handle new events. Takes 5 minutes with AI assistance.

Issue 2: Ask AI to fix the status code. It regenerates the error handler correctly. Takes 2 minutes.

Issue 3: Ask AI to add an index on the subscription state and version fields. It generates the migration script. Takes 3 minutes.

Re-run tests. All green. Commit.

Elapsed time: 30 minutes. The fast feedback loop – generate, test, fix, repeat – means issues are caught while context is fresh.

11:30 AM-12:15 PM: Staging and integration

Code hits the CI/CD pipeline automatically. Builds, runs tests, deploys to staging environment. Takes 8 minutes.

Integration tests run against real backing services (database, event bus, billing service). Two tests fail:

Failure 1: The billing service in staging is running an older version that doesn’t understand the new event schema.

This is a real cross-service dependency. The engineer checks: billing service’s main branch already supports the new schema (another pod added it last week), but staging hasn’t been updated.

Solution: Deploy latest billing-service to staging. Takes 5 minutes.

Failure 2: The notification service doesn’t send the “3 days before resume” notification because it doesn’t know about the new event type.

This is a gap the pod needs to fill. The engineer adds the notification event type to the schema registry and uses AI to generate the notification handler. Commits to notification-service repo (which this pod also owns).

Takes 15 minutes including testing.

Re-run the pipeline. All green in staging.

The pod does a quick review in staging:

Product lead verifies the UX flows work correctly
Quality strategist checks observability (metrics, logs, traces, alerts)
Solution architect verifies behavior under failure conditions (kills the billing service mid-resume, checks behavior)
Engineers check performance characteristics (sub-100ms p99 latency, good)

Elapsed time: 45 minutes. Traditional staging review would be scheduled for next week.

12:15-1:00 PM: Production and documentation

Product lead approves: “This is what we wanted. Ship it.”

Engineer clicks the deploy button. Production deployment takes 12 minutes (blue-green deployment with health checks).

While deployment runs, the solution architect asks AI to generate documentation. The prompt references the feature’s state machine, API contracts, and failure modes. AI generates both user-facing docs and internal runbook in 2 minutes. Architect reviews, makes minor edits for tone, commits to the docs repo.

This is the continuous documentation we mentioned earlier – not a separate task, but part of the workflow. Part 1 taught us that clear communication is our job. Here, that means documenting continuously so future AI prompts (and future humans) have context.

1:00 PM: Feature is live. Documentation is published. Monitoring shows normal behavior. Customer success team is notified.

Total elapsed time: 4 hours from requirement to production. A traditional team would take 2-3 weeks for the same feature.

Note: The pod worked in parallel where possible. While the architect refined the solution design, the engineer reviewed the initial design. While the engineer implemented, the quality strategist prepared test scenarios. This parallel work, combined with AI acceleration, is how four hours becomes possible.

When reality hits: The architect disappeared

This sunny-day scenario makes it look easy. Here’s what happens when it’s not.

Real scenario: Solution architect on a fintech pod had a family emergency and was out for three weeks with no notice. The pod had two features in flight and three in the backlog.

Traditional team response: Stop development until replacement found, or struggle through with degraded effectiveness while remaining members try to figure out what the architect knew.

This pod’s response:

Day 1 (architect leaves):

Product lead and senior engineer meet. They load the pod’s artefact store into Claude:

Domain model documents (maintained continuously)
Architectural decision records (generated after key decisions)
Event schemas and API contracts
Test strategy documents
Prompt library (reusable templates)

They prompt AI: “Generate a briefing for temporary architectural coverage. Include system summary, in-flight work status, upcoming work, key decisions, and risk areas.”

AI produces a 10-page briefing that gives the senior engineer enough context to make architectural decisions for simple cases and know when to escalate for complex ones.

Week 1-2:

Pod continues delivering, but with adjusted scope. Simple features that fit existing patterns: shipped. Complex features requiring new architectural decisions: deferred or escalated to portfolio architect.

Senior engineer prompts AI throughout: “Given our existing domain model and patterns, how should we handle X?” The AI, loaded with the pod’s context, provides options consistent with their established approach.

Week 3:

Architect returns. Reads the updates (captured continuously in ADRs and docs). Gets caught up in 2 hours. Pod hasn’t lost momentum.

This works because the pod treated documentation not as a separate activity but as continuous output. Every decision generated an ADR. Every feature updated domain docs. Every change updated diagrams.

The context lived in the system, not just in people’s heads.

That’s the insurance policy against the inevitable chaos.

KEY TAKEAWAY

Use AI to document continuously, not as a separate task. Future you (and future AI prompts) will need that context when the inevitable disruption happens.

What could go wrong (and does)

This sunny-day scenario makes it look easy. Reality is messier. Here are common failure modes:

Anti-pattern 1: The 500-line prompt that produces garbage

A new pod tries to describe the entire feature in one massive prompt: requirements, constraints, edge cases, implementation details, test scenarios, deployment instructions.

AI produces 3000 lines of code that compiles but doesn’t work. The domain model is muddled, the state machine is wrong, and the error handling is nonsensical.

Why: Too much context crammed into one prompt exceeds the AI’s ability to maintain coherence. Like asking a human to implement an entire feature specification from memory without asking clarifying questions.

Fix: Small increments. Model the domain first. Then contracts. Then one component. Then its tests. Each step builds on the previous with tight feedback.

Anti-pattern 2: “Make it intuitive”

Product lead’s prompt: “Design a user interface for subscription management that’s intuitive and delightful.”

AI produces something that looks like every generic subscription UI, with patterns that don’t fit this product’s existing design system.

Why: “Intuitive” and “delightful” are meaningless to AI without context. These words mean different things to different users.

Fix: Be specific. “Use our design system components [link]. Follow the patterns from the account settings page. Place pause button next to cancel button with warning state. Show confirmation modal with clear explanation of what pause means. Include examples.”

Anti-pattern 3: Ignoring AI output and rewriting everything

Engineer generates code with AI, looks at it, thinks “this isn’t how I would do it,” and rewrites from scratch manually.

Now the feature takes 3 days instead of 3 hours, and the engineer is frustrated that “AI didn’t help.”

Why: Sometimes the AI’s approach is different but fine. Engineers who insist on doing it their way negate the value of AI acceleration.

Fix: Ask “is this wrong, or just different?” If the AI’s implementation is correct, passes tests, and meets non-functionals, ship it. Save the manual rewrite for cases where the AI’s approach is actually problematic.

Anti-pattern 4: No documentation of context

Pod delivers features rapidly but captures nothing. No ADRs, no domain model docs, no decision log.

Three months later, someone asks “why did we implement pause this way instead of using scheduled jobs?” No one remembers. The original pod members have moved to other work.

Why: Speed without documentation creates technical debt. Future developers (or future AI prompts) lack context to make good decisions.

Fix: Use AI to generate documentation continuously. After each significant decision, prompt: “Generate an ADR capturing why we chose optimistic locking over pessimistic locking for state transitions, including trade-offs considered.” Takes 2 minutes.

Anti-pattern 5: Ignoring enterprise guardrails

Pod moves fast, delivers value, but violates data governance policies by storing customer data in a region not approved for that data classification.

Security team discovers this months later. Feature has to be rolled back and reimplemented. Customer trust damaged.

Why: Speed without guardrails creates compliance risk. Pods have authority within boundaries, not unlimited authority.

Fix: Encode guardrails as policy-as-code that the pod’s CI/CD pipeline enforces automatically. The pod can’t deploy code that violates data policies, security standards, or architectural principles. The guardrails catch issues before they reach production.

The economics of pods (or: how to explain this to your CFO)

Your CFO asks: “Why are we investing in AI tooling and reorganising teams?”

Show them this:

Traditional 12-person team:

Cost: $2.4M/year (loaded)
Output: 40-60 features/year
Cost per feature: $40K-60K

AI-first 4-person pod:

Cost: $850K/year (including AI/tooling)
Output: 200-250 features/year
Cost per feature: $3K-4K

That’s not 3x better (which you’d expect from 12→4 people).

That’s 10-15x better.

Your CFO will ask: “Where’s the catch?”

There isn’t one. The catch was the old way:

40% of calendar time was waiting between handoffs
20% of effort was rework from miscommunication
15% was coordination overhead
30% of coding time was boilerplate

The pod eliminates all of that.

The AI doesn’t make individuals faster. It makes the system dramatically more efficient by removing friction between people.

Your competitors are doing this math right now.

KEY TAKEAWAY

The cost savings aren’t from AI writing code faster. They’re from eliminating the waiting, handoffs, and rework that consumed 70% of your calendar time.

What to try this week

Don’t try to create a full pod immediately. Test the concepts:

Experiment 1: The 30-minute feature

Take a small, self-contained feature. Get the product person, architect, and one developer in a room (or video call). Use AI to:

Model the domain (5 minutes)
Generate the implementation (10 minutes)
Generate tests (5 minutes)
Review and refine (10 minutes)

See what happens. Where did the AI misunderstand? Where did communication break down? Where did it accelerate dramatically?

Experiment 2: Documentation recovery

Pick a feature your team built 6 months ago. Try to recreate the context:

Why did you build it that way?
What alternatives were considered?
What assumptions were made?

How much can you remember? How long would it take to onboard someone new to that feature?

Now try having AI generate documentation from the code, tests, and commit messages. How close does it get? What’s missing?

This exercise reveals how much context your team is losing continuously.

Experiment 3: Identify your handoff tax

Take your last shipped feature. Map every time work moved between people:

Requirements to design
Design to development
Development to QA
QA to deployment

At each handoff, measure the wait time. Add it up. What percentage of the total timeline was waiting?

That percentage is your opportunity. That’s what pods eliminate.

Pods aren’t theoretical. Teams are running them right now, delivering at speeds that seem impossible to traditional organisations. The mechanics are learnable. The practices are replicable.

The hard part isn’t the AI. The hard part is restructuring around it.

But there’s a problem: When pods ship in hours instead of weeks, who makes architectural decisions?

If every decision needs approval from an enterprise architect, you’ve just recreated the bottleneck. But if pods make every decision independently, you get chaos.

How do you structure architecture so pods can move fast without breaking things?

The next article shows how the architecture function itself changes when pods exist: who decides what, at what level, and how to prevent both bottlenecks and anarchy.