The Agentic AI Playbook, Part 1 of 7: What Agentic AI Actually Does (And Why Most Teams Get It Wrong)

Part 1 of The Agentic AI Playbook

Series Navigation:

Most people who say they’ve tried AI-assisted development haven’t tried agentic AI. They’ve tried a smarter autocomplete. It’s a completely different thing, and conflating the two is the reason so many teams write off the technology before they’ve understood it.

GitHub Copilot suggests a line. You accept or reject. ChatGPT explains an error. You copy the fix back into your editor. Both useful. Neither is what we’re talking about here.

An agentic AI doesn’t wait for you to ask questions. It reasons about a goal, breaks it into steps, executes those steps, reads what happened, and adjusts. Give it a task like “add pagination to the customer list endpoint” and it will read your existing controller, service, and repository code; identify the patterns you already use; modify the relevant files; write or update the tests; run them; read the failures; fix the code; and run them again. All without you typing a single line.

That is not autocomplete. That is the work.

Table of Contents

What This Actually Changes

The shift is not about speed. Speed is a side effect. The real shift is what your job becomes.

When an agentic AI handles implementation, you are no longer the person who writes code. You become the person who defines intent clearly enough for someone else to build it correctly. Think of it as moving from writing the code to directing the engineer who writes it.

That engineer is extraordinary in some ways: it has absorbed millions of lines of code, works at machine speed, and never gets tired. It is also flawed in familiar ways: it can misunderstand requirements, make assumptions, and produce something that looks right but isn’t. It will not slow down when it’s confused. It will confidently generate plausible output regardless of whether it understood you correctly.

That last point matters more than most people realise. The frustrations people have with agentic AI — wrong output, missed edge cases, broken assumptions — almost always trace back to missing inputs, not to the AI. Vague prompt. No spec. No context about how things work here. The agent is a mirror: it reflects the quality of the instructions it receives.

This is actually good news, because it means you have control. But it requires accepting that your job changes.

The Frustration Curve

Nobody talks about this honestly, so here it is.

Days one and two feel like magic. The agent writes a service class in thirty seconds. You wonder why you ever did this manually.

Days three to five, the first real task hits a wall. The agent builds something that doesn’t match what you meant. Your requirements were vaguer than you realised. You discover how much implicit knowledge was sitting in your head that never made it into the prompt. You spend more time reviewing than you saved writing.

Week two is the crisis point: “Is this actually faster?” You’re not fighting the tool. You’re discovering how much your old workflow relied on the slow pace of manual coding as a thinking aid. Writing code by hand gave you time to notice gaps while you were typing. That disappears with agentic AI. The thinking still has to happen; it just has to happen earlier and more deliberately.

Week three or four: breakthrough. Rules are dialled in. Prompting is specific. You know when to push through versus when to start fresh. Speed genuinely exceeds hand-coding.

If you’re in the trough at week two, that is normal. The investment in preparation pays off. That’s what this series is about.

The Mental Model in One Page

You don’t need to understand every technical detail of how Claude Code works before you start. But seven concepts will save you hours of confusion.

The context window is the agent’s working memory: everything in the current conversation, including your messages, its replies, file contents, and command outputs. It’s large (around 150,000 words) but not infinite. Long conversations degrade. When the agent starts forgetting earlier instructions or repeating mistakes, it’s time to start fresh.

Rules are instruction files (called CLAUDE.md) that shape the agent’s behaviour for your project. They work like a .editorconfig — except instead of configuring a linter, they configure your AI. Write them once, and the agent follows them in every conversation, for every team member. This is where you encode your standards, your patterns, your constraints. Part 2 of this series covers this in detail.

Memory is how the agent retains knowledge between conversations. Without it, every session starts from scratch. With it, the agent already knows your stack, your conventions, and your decisions from previous sessions. Part 2 covers this too.

Tools are the agent’s hands: it can read files, write files, run shell commands, search codebases, call APIs. Without tools, it can only talk. With tools, it acts.

Skills are packaged instructions for specific tasks, activated by slash commands. A code review skill. A TDD workflow skill. A debugging skill. They focus the agent on a specific way of working rather than leaving it to generalist defaults.

MCPs (Model Context Protocol) connect the agent to external systems — GitHub, databases, browsers. Think of it as a standard plugin interface: the agent can reach anything that has an MCP server.

Hooks are automated checks that run before or after the agent acts. Before it writes a file, run a linter. After it writes a file, auto-format it. They give you guardrails without manual review of every action.

You don’t need all of these on day one. But when something confusing happens — and it will — the answer is almost always in one of these seven concepts.

This Works With Any Agentic Tool

This series uses Claude Code as its primary example because that’s what I use and recommend. But the principles here — spec-first development, establishing context before you build, treating the AI as a collaborator not an oracle — apply to any agentic tool. Cursor, aider, Windsurf, and others follow the same fundamental model. The specific commands differ. The underlying discipline doesn’t.

Where I say “Claude” or “Claude Code,” read it as shorthand for “whatever agentic AI tool you’re using.”

The principles in this series apply whether you are building from scratch or working inside an existing codebase, but the starting point is different. In greenfield work, you define your world before the AI builds in it: your patterns, your conventions, your architectural decisions. In brownfield, the legacy codebase becomes the AI’s source of truth. It will mirror what it reads — patterns, naming, structure — unless you explicitly redirect it. Both contexts reward the same preparation discipline. The baseline is just different. Later posts return to this distinction in more depth, particularly around context setup and daily workflow.

Devil’s advocate

Where does this argument break down?

For small, isolated, well-understood tasks, the preparation overhead can feel disproportionate. If you need to rename a field across a codebase, you don’t need a spec. You need a prompt and five seconds. Treating every task as a multi-step process adds friction where none is needed.

There’s also a real learning curve cost. The first two weeks are genuinely slower for most people. Teams under delivery pressure may not have that runway, and forcing agentic AI into a sprint without the setup time can make things worse, not better.

And it’s worth being honest about cost. Heavy users can spend $50-150 per day on API calls. Across a team, that adds up fast. Multi-agent workflows multiply it. If nobody is tracking this, the first monthly bill will be a surprise.

None of that changes the core argument. But it does mean this isn’t zero-cost to adopt, and the benefits don’t appear immediately.

What to do Monday morning

Use the rest of this post as your reference, then do three things.

First, install Claude Code and give it a task you’d normally do yourself. Something real, not a toy example. Watch what it does, where it succeeds, and where it misses. Don’t fix the prompt yet — just observe.

Second, write down every time it produced something that wasn’t what you wanted. For each one, ask: was the prompt specific enough? Did it have the context it needed? You’ll find the answer is almost always no.

Third, read Part 2 of this series before you use it again. The frustration curve is real, but it’s mostly caused by starting without a briefing document. Part 2 is the briefing document.

Next: Part 2 — What Your AI Doesn’t Know (And How to Fix That)