Skip to content

Why spec-based agentic development starts with QA thinking

2026-03-23 · 9 min · Oleg Neskoromnyi

I've always believed that clear acceptance criteria and solid specifications lead to better products. Not in a theoretical way — in a "I've watched the same bug get shipped three times because nobody wrote down what the feature was supposed to do" kind of way. That's QA nature. You see things break enough times and you start caring a lot about what "done" actually means. And that belief is what led me to spec-based agentic development — an approach where clear specs come first and AI agents come second.

But here's the thing — as a QA engineer, I wasn't the one writing those specs. Product managers wrote them. Or didn't write them. Or wrote a one-liner in Jira and called it a day. My job was to take whatever showed up in the ticket and figure out how to test it. And when the spec was vague, I'd push back. Can you add acceptance criteria? What's the expected behavior when the input is empty? What does "the form should work correctly" actually mean?

For years, I had this conversation. In retros, in ticket refinements, at meetups. The answer was usually the same: polite nod, maybe a slightly better ticket next sprint, then back to the old habits.

I get it. Writing detailed specs felt like overhead. One more document. One more meeting. One more thing standing between having an idea and shipping code.

Then AI coding agents showed up. And suddenly, knowing what a good spec looks like stopped being a QA concern and became the most practical skill in the room.

Why Specs Felt Like a QA Problem

Before AI agents, human developers filled in the gaps. You give a developer a vague ticket that says "add user filtering to the dashboard," and they'll ask questions. They'll look at the existing code, check how filtering works elsewhere in the app, make reasonable assumptions. A good developer compensates for a bad spec. Not perfectly, but enough that the feature ships and mostly works.

That's why nobody else cared about specs the way QA did. The system worked — kind of. Developers filled the gaps with experience and intuition. The cost of vague specs was real (rework, missed edge cases, bugs that showed up in production), but it was invisible enough that most teams just absorbed it.

Invisible to everyone except QA. Because we were the ones who found the things that slipped through the cracks. And I could trace most of those cracks back to the same place: the ticket didn't say what the thing was supposed to do. So I'd file the bug. And the developer would say, "that wasn't in the requirements." And they'd be right.

But convincing a team to slow down and write better specs when the current process "works fine"? That's the fight I kept having. It worked sometimes — just not as often as I wanted.

How AI Agents Made Spec-Based Development Essential

AI coding agents don't fill in gaps. They don't ask clarifying questions mid-task. They don't check how things work elsewhere in the codebase and make educated guesses. They take what you give them and build from it — and when your instructions are vague, they make choices. Sometimes good ones. Sometimes ones that send you on a 45-minute detour undoing something you never asked for.

I noticed this the moment I started using Claude Code for real work — not demos, not experiments, actual features I needed to ship. The quality of the output was directly tied to how clearly I described what I wanted. Not the cleverness of my prompt. Not how many magic words I used. Just how specific my instructions were.

OpenAI's analysis of SWE-bench Verified shows this pattern clearly. When human annotators screened GitHub issues for well-specified descriptions — clear reproduction steps, explicit expected behavior, defined scope — agent resolution rates improved dramatically compared to the broader, less-specified benchmark. The quality of the issue description was a key factor in whether agents could solve the problem.

Anthropic's own research backs this up. Their guide on building effective agents, based on working with dozens of teams, found that tool definitions and specifications deserve just as much attention as the prompts themselves. In one case, simply requiring absolute file paths instead of vague relative ones eliminated a whole category of agent errors. Small spec details, big impact.

And here's the moment it clicked for me: that list — clear reproduction steps, explicit expected behavior, defined scope — is just acceptance criteria. It's what I'd been asking product managers to put in tickets for my entire career.

Spec-based agentic development is the practice of writing clear specifications — acceptance criteria, expected behavior, edge cases, and constraints — before handing work to AI coding agents. Instead of iterating through vague prompts, you invest time upfront in defining what "done" looks like, then let the agent build to that definition.

What Good Specs Look Like When an Agent Reads Them

After years of reviewing other people's specs, I know exactly what's missing when a ticket is vague. I know which questions to ask. I know where the edge cases hide. At work, I still can't write the specs — that's the PM's job. But on my personal projects, there's no PM. There's no developer. It's just me and the AI agent. And on those projects, I finally get to write the specs the way I always wanted them written.

Here's the difference. Say you need a feature: a search input that filters a list of blog posts by title.

The vague prompt:

"Add search functionality to the blog page"

An agent will build something. It might filter by title. It might filter by content. It might add full-text search with debouncing and highlighting. You won't know until it's done, and then you'll spend time adjusting.

The spec-based prompt:

|feature-spec.md
## Feature: Blog search filter

**What it does:**
- Text input at the top of the blog listing page
- Filters posts by title only (not content, not tags)
- Case-insensitive matching
- Filters as the user types (no submit button)
- Shows "No posts found" when filter matches nothing
- Clearing the input shows all posts again

**What it doesn't do:**
- No URL query parameter (this is client-side only)
- No search history or suggestions
- No content or tag search

**Edge cases:**
- Empty blog list: input should still render but be disabled
- Special characters in search: treat as literal text

That second version takes maybe five minutes to write. And the agent builds exactly what you described — including handling the edge cases — on the first try. No back and forth. No "that's not what I meant." No rework.

The funny part? That spec looks exactly like what I used to ask PMs to write in Jira tickets. Same structure. Same level of detail. The audience just changed — from a developer to an AI agent.

The Rework Loop Nobody Talks About

There's a cost to vague prompts that doesn't show up immediately. You give the agent a loose description, it builds something, you look at it and say "no, not like that," you describe it differently, it rebuilds, you adjust again. Each round feels small. But three or four rounds and you've spent more time correcting than you would've spent writing the spec.

I've been through this loop enough times to see the pattern. The correction rounds add up fast — and not just in time. Each round adds noise to the conversation context, which makes the agent's next attempt worse, not better. By round three you're often better off starting over with a clear spec than continuing to fix what's already off track.

If you've worked in QA, this feels obvious. It's the same thing we've always known — preventing defects is cheaper than finding them later. With AI agents, "later" is about three minutes after you hit enter. The agent doesn't push back on a vague spec the way a teammate might. It just builds something that doesn't match what you had in mind.

Before you hand a task to an AI agent, write down three things: what the feature does, what it doesn't do, and what happens at the edges. Five minutes of spec writing saves thirty minutes of correction.

Why QA Engineers Already Know How to Do This

I don't mean this in a hand-wavy "soft skills transfer" way. I mean the daily work of a QA engineer is the skill that makes AI agents productive.

Think about what we do:

  • Break ambiguity into testable assertions — "the form should work correctly" becomes "submitting with an empty email field shows validation error X, submitting with an invalid format shows error Y, submitting with a valid email redirects to page Z"
  • Know what "done" means — not "it works" but "these 12 specific behaviors are verified"
  • Think about edge cases before anyone else does — what happens with empty input, special characters, concurrent users, network failures
  • Spot the gap between what was asked and what was meant — the ticket says "add filtering" but the user actually needs "find the post they wrote last Tuesday"

Every one of those maps directly to writing better specs for AI agents. The acceptance criteria I'd ask a PM to add to a ticket are the same criteria that guide an agent's implementation. And the edge cases I'd put in a test plan are the same edge cases that prevent an agent from building the wrong thing.

I spent years building this skill because it made me better at testing. I didn't expect it to make me better at building.

What I'd Tell the Meetup Audience Now

If I gave that talk today, I'd change the pitch. I wouldn't say "push for better specs because it improves quality" — that's true, but it's the same argument that got polite nods for years.

I'd say: write specs because AI agents are only as good as your instructions. The models keep getting more capable, but capability without direction just means the agent builds the wrong thing faster and more confidently.

The era of vibe coding — where you describe what you want and hope for the best — was fun for prototyping. I wrote about my own experience with it and the trap of building features nobody asked for. But for anything that needs to work reliably, specs aren't optional anymore. They're the interface between what you want and what the agent builds.

What changed isn't the advice. It's my situation. At work, I'm still QA — I can ask for better specs, but I don't write them. On my personal projects, though, I'm everything. I'm currently running five different projects where it's just me and the AI agents. No PM. No developer. No one to ask for better requirements. I write the specs, the acceptance criteria, the edge cases — and then I hand them to the agent. And it works. The agent listens on the first try, every time.

That's the thing I couldn't prove in a meeting or a retro. I could say "clear specs lead to better outcomes" and get a polite nod. Now I can show it. Five projects, all built spec-first, all moving forward without the rework loops I used to see at work.

A lot of QA professionals are sitting on the same insight and don't realize how valuable it's become. Years of reading vague tickets, asking "but what about this edge case?", pushing for clearer acceptance criteria — that's not just testing experience. That's the skill that makes the difference between AI agents that waste your time and AI agents that build what you actually need.

That's not a QA skill anymore. That's a building skill.


Have you noticed a difference between vague prompts and detailed specs when working with AI agents? I'm collecting real examples of spec-based workflows — share yours here.

Stay Connected

Subscribe and get instant access to 50 free AI prompts for software testers — plus new articles on AI-powered testing, automation strategies, and quality leadership. No spam, unsubscribe anytime.