Understanding Spec-Driven Development

The fundamental nature of building software has shifted. Writing and reviewing syntax is no longer the primary bottleneck in the software development lifecycle; the new challenge is conveying precise architectural intent to large language models (LLMs). Spec-driven development prioritizes authoring a detailed implementation specification before a single line of executable code is generated.

Your role shifts from manual executor to technical verifier. Spec-driven development (SDD) replaces "vibe coding" — a process that is functionally non-deterministic and fails at scale — with a structured framework that governs how the AI operates. By defining the "what" and the "how" through a formal contract, you give coding agents the anchoring they need to produce reliable, maintainable systems.

SDD treats the specification as the primary artifact of the project, while the code becomes a downstream implementation detail. This shift matters because AI is only as capable as the instructions it receives. Without a rigorous specification, you risk accruing technical debt through context drift and architectural fragmentation, where AI-generated features fail to align with existing system constraints or established design patterns.

The specification is the central blueprint that guides an AI coding agent to generate code for a software project.

What is spec-driven development?

Spec-driven development is an evolution in the software development lifecycle (SDLC) where a detailed specification is the single source of truth. In traditional development, documentation often follows the code as an afterthought. SDD reverses this, establishing the specification as a contract that defines the system's behavior, data schemas, and constraints before implementation begins. This gives the coding agent a clear, unambiguous roadmap and reduces the likelihood of hallucinations or ignored requirements.

While it shares DNA with test-driven development (TDD) and behavior-driven development (BDD), SDD is effectively "TDD on steroids." Where TDD focuses on outcome-based tests and BDD on collaborative behavior, SDD demands a comprehensive definition of the entire implementation plan. It treats code as a last-mile concern, shifting the developer's focus higher up the abstraction ladder to the level of architectural intent.

By formalizing requirements upfront, the team establishes a shared understanding that governs every phase of the project. This matters most when using AI agents, since these models require precise instructions to stay consistent across a codebase. The specification becomes the governing document the AI reads to understand the broader system context, which keeps it from making isolated decisions that break existing functionality.

Why SDD emerged: vibe coding and technical debt

SDD is a direct response to the "vibe coding" trend, where developers use natural language prompts to generate code without structured planning. Vibe coding is useful for rapid prototyping, but it is inherently non-deterministic. That lack of structure leads to a slippery slope of technical debt, marked by context drift — a fix in one area inadvertently breaks another because the AI lacked a global understanding of the system's state and dependencies.

Vibe coding without a spec leads to technical debt: context drift, architectural fragmentation, and rising operational cost.

Technical debt in the AI era also shows up as architectural fragmentation. Without a spec to anchor the generation process, newly created features may fail to align with established conventions, eroding the maintainability of the codebase. There is a significant operational cost too: developers often get trapped in lengthy, expensive prompting cycles, consuming thousands of tokens to resolve errors that a well-defined specification would have prevented entirely.

SDD puts the brakes on vibe coding. It prevents the scenario where a developer rushes a feature through a series of AI prompts, creating a functional UI that hides vulnerabilities, dependency conflicts, and forgotten edge cases. By establishing a formal specification, you ground the AI's output in a verified design rather than a probabilistic guess based on a vague prompt.

The three levels of SDD: spec-first, spec-anchored, spec-as-source

Adopting SDD typically happens across three levels of maturity. The first is spec-first, the entry point where a specification is written before code generation to provide initial clarity. In this model the spec is not necessarily maintained as the software evolves, so it can quickly grow outdated as the implementation diverges from the initial plan.

The second level is spec-anchored, where the specification is a living document that evolves alongside the code. When requirements change, the spec is updated first. Automated tests often connect the two here, integrated into a CI/CD pipeline to keep the implementation and the documentation synchronized. This is generally the most practical and scalable approach for professional engineering teams.

The most advanced level is spec-as-source. At this stage humans only interact with the specification; the AI handles all code refactoring. This mirrors the historical failures of Model-Driven Development (MDD). Just as MDD struggled with awkward abstraction levels and inflexibility, spec-as-source risks combining the constraints of rigid modeling with the non-determinism of LLMs. Without rigorous human oversight, this level can produce inconsistent output that is hard to validate.

The three levels of spec-driven development on a rising scale: spec-first, spec-anchored, and spec-as-source.

What goes into a good spec?

A high-quality specification is structured and behavior-oriented rather than merely descriptive. It includes precise input and output definitions, data schemas, and success criteria. An "Out of Scope" section is vital for token management: by stating explicitly what the AI should not build, you keep the agent from spending the context window on unnecessary gold-plating and scope creep.

The four core parts of a good spec: Overview, Acceptance Criteria, Data Schema, and Edge cases and Out of scope.

Data schemas in the spec are the definitive API contract for the LLM. Defining the shape of the data upfront reduces the chance the AI invents its own keys or types — a common source of integration failure. Effective specs often use Gherkin-style (Given/When/Then) syntax for acceptance criteria — a hallmark of tools like Kiro — so requirements are immediately translatable into testable units.

The spec should also cover edge-case handling so the AI considers failure modes like empty inputs or expired sessions. The goal is enough detail to eliminate ambiguity while staying lightweight enough to iterate on. Below is an example of a structured SPEC.md that uses explicit tags to drive an AI agent.

markdown

# FEATURE: User Authentication Endpoint
 
@generate: src/routes/auth.js
@test: tests/auth.test.js
 
## OVERVIEW
 
Implement a secure POST /login endpoint for user authentication.
 
## DATA SCHEMA
 
- Input: `email` (string, required), `password` (string, required)
- Output: `200 OK` with JWT, or `401 Unauthorized` with generic error.
 
## ACCEPTANCE CRITERIA
 
1. GIVEN a valid email and password, WHEN the POST request is sent, THEN return a signed JWT.
2. GIVEN invalid credentials, WHEN the request is sent, THEN return a 401 status.
3. GIVEN 5 consecutive failed attempts, WHEN a 6th request is made, THEN lock the account for 15 minutes.
 
## EDGE CASES
 
- Reject empty fields client-side before submission.
- Ensure passwords are never stored or transmitted in plain text.
 
## OUT OF SCOPE
 
- Social OAuth login (Google/GitHub).
- Password recovery flow.

How the SDD workflow runs with an AI coding agent

The standard SDD workflow is a pipeline of Requirements → Design → Tasks → Implementation. You start by prompting the system behavior and constraints, which the AI uses to generate a requirements document. Once you approve these, the AI translates them into a design document and a granular list of implementation tasks. Throughout, your role is the verifier — reflecting on and refining the AI's output at every stage.

The spec-driven development workflow with an AI coding agent: Requirements, Design, Tasks, Implementation, with a review and verification loop.

Operationally, managing context is the primary challenge. Agents like Claude Code run with a default context window (typically 200k tokens), while power users may reach for extended 1m-token windows. For heavy usage, you can switch to an enterprise backend by setting CLAUDE_CODE_USE_BEDROCK=1. This lets you ingest large steering documents — like CLAUDE.md — that hold project-wide rules and architectural principles the agent must reference.

To pull in context, the agent uses file navigation and text search to read the spec files. Be wary here: agents rely on retrieval or basic text search, which can lead to context blindness. If a function isn't indexed or the search terms don't match, the agent might miss an existing utility and generate a duplicate. Steering documents reduce that risk by giving the agent a high-level map of the system's structure.

SDD tools: Kiro, Spec-kit, Tessl

Three spec-driven development tools placed on the spec-first to spec-as-source spectrum: Kiro, Spec-kit, and Tessl.

Several tools have emerged to support this methodology, each sitting at a different point on the SDD spectrum:

Kiro: A lightweight, VS Code-based tool that guides you through a strict three-step workflow of Requirements, Design, and Tasks. It leans on the spec-first approach and is notable for its heavy use of Gherkin-style acceptance criteria to capture the Given/When/Then logic of each feature.
Spec-kit: Developed by GitHub, this CLI-based tool is the most customizable. It centers on a "Constitution" — a rules engine built from immutable, high-level principles — and uses elaborate Markdown checklists so the agent verifies its own work against your standards before proceeding.
Tessl: This framework targets the spec-as-source level. It uses tags like @generate and @test inside the specification to drive a 1:1 mapping between spec and code, exploring a future where code becomes a generated artifact the human never refactors by hand.

Limits and criticism: waterfall echoes and the over-engineering trap

Critics argue SDD risks reviving Waterfall's greatest failure: Big Design Up Front (BDUF). Spending excessive time planning before coding piles up untested hypotheses. "Markdown Madness" is a wearying reality too — a simple feature can generate over 1,300 lines of prose across multiple files. Spending most of your time reading verbose, AI-generated text rather than solving architectural problems is a clear sign of diminishing returns.

Another drawback is the "double code review" problem. Because you are the verifier, you review the technical specification (which often contains logic or code snippets) and the resulting implementation. That effectively doubles the review cycle. SDD agents also suffer from context blindness because they rely on text-based retrieval: they may fail to spot existing functions that aren't explicitly linked in the spec, leading to redundant code and a codebase that gets worse while you try to improve it.

There is a "sledgehammer to crack a nut" problem as well. For small bug fixes or minor CSS tweaks, the overhead of the SDD workflow (Requirements, Design, Tasks) is excessive. The cost of maintaining these documents can quickly outweigh the productivity gains, especially in complex brownfield codebases where the agent's limited search capabilities struggle to grasp the full architectural context.

When to use (and not use) SDD

SDD is a strong fit for greenfield projects or starting major new features from scratch. It excels at establishing a clear architectural foundation when the cost of planning is lower than the cost of fixing a fundamentally misunderstood implementation. It provides the brakes that keep an AI agent from inventing its own patterns during the initial build phase of a system.

SDD is a poor fit for small bug fixes or maintenance tasks in large, existing codebases. In these brownfield environments, the risk of context blindness is highest. For highly non-deterministic problems where requirements keep shifting, an iterative "Natural Language Development" approach — rooted in Agile and Lean Startup principles — works better: split complex requirements into small, testable experiments rather than drowning the process in upfront documentation.

When to use SDD: a strong fit for greenfield and security-critical work; a poor fit for small fixes and brownfield projects.

Ultimately, let risk dictate your choice. Use SDD when the cost of misunderstanding is high. For everything else, lean on the speed of Agile iterations — identify the riskiest assumption and use the AI to build the simplest experiment that tests it. Don't let the spec become a new form of technical debt you have to maintain long after the code has moved on.

Understanding Spec-Driven Development

What is spec-driven development?

Why SDD emerged: vibe coding and technical debt

The three levels of SDD: spec-first, spec-anchored, spec-as-source

What goes into a good spec?

How the SDD workflow runs with an AI coding agent

SDD tools: Kiro, Spec-kit, Tessl

Limits and criticism: waterfall echoes and the over-engineering trap

When to use (and not use) SDD

References

Read more

The Best Claude Code Skills to Make Claude Work Like a Senior Engineer

Claude Code's /goal Command: Engineering Autonomous Cycles

Loop Engineering: Building Software With Agent Loops