Spec-Driven Enterprise Delivery

AI does not fail only because it is unreliable.

It often fails because we ask it to fulfill expectations we never explained.

That is the core of spec-driven development for me.

Not process. Not documentation theater. Not a return to slow software delivery.

Spec-driven development is the practice of giving AI the information it needs to do the work correctly: purpose, context, boundaries, constraints, architecture, business expectations, acceptance criteria, and review gates.

If the AI should meet an expectation, the expectation must be available to the AI.

That sounds obvious.

In practice, it is where many AI-assisted projects break.

The Goal Is Not More Code

The goal is not to produce more code faster.

That is too small.

Enterprise software has users, data, security boundaries, compliance pressure, legacy systems, audit questions, operations, budgets, deadlines, and people who need to understand why a system behaves the way it behaves.

AI can make implementation faster, but that is only the shallow part.

Used well, AI can also help make enterprise requirements visible. It can check whether security boundaries are covered. It can explain how data flows through the system. It can point to tests. It can summarize architectural decisions. It can prepare audit evidence. It can answer a human who asks: “How is this topic covered? Show me. Explain it. Prove it.”

That is the more interesting goal.

AI-assisted delivery should help teams build software that meets expectations, respects constraints, can be explained, can be tested, and can be audited.

That requires a shared knowledge layer.

For me, that is one of the most important roles of a spec. It is centralized, comprehensive knowledge for humans and AI at the same time. People can see the intent, boundaries, decisions, and acceptance criteria. AI agents can use the same source to implement, explain, test, and prove the work.

But that only works if the AI has the right information to reason from.

AI Can Only Meet Expectations It Can See

If you know it, write it down.

That sounds almost too simple, but it is the center of the whole workflow.

If you know the business reason, write it down.

If you know the technical boundary, write it down.

If you know the existing architecture pattern, write it down.

If you know the security constraint, write it down.

If you know what would make the result wrong, write it down.

AI agents do not share your memory. They do not know the meeting, the political constraint, the legacy scar, the product promise, the audit pressure, or the architecture rule unless that information is present in the task.

Start with the knowns:

purpose, intent, and business point of view
boundaries, non-goals, and existing context
tech stack and architecture decisions
security, compliance, and operational constraints
acceptance criteria and review questions

The better the expectation is described, the less the AI has to guess.

And guessing is where delivery risk starts.

Human Language Becomes The Interface

Programming languages have always been an interface between humans and computers.

But they were never a native language for either side.

Humans think in goals, constraints, tradeoffs, examples, risks, and stories. Then we translate that into TypeScript, Python, SQL, or whatever stack the system needs. The computer still needs parsers, compilers, interpreters, runtimes, operating systems, and machines that execute instructions at a much lower level.

Programming languages are the middle layer both sides can use.

They are powerful, precise, and necessary.

But for humans, they are still a second language.

For decades, software development was mostly one directional. The human translated intent into code. The machine accepted or rejected it. The feedback loop existed, but the computer did not discuss intent with us in our own language.

That is the paradigm shift.

With AI-assisted development, human language becomes a two-way interface. We can describe goals, constraints, behavior, examples, risks, and review questions. The AI can respond, ask clarifying questions, explain tradeoffs, inspect code, generate tests, summarize architecture, and help translate the result into implementation.

This does not mean programming languages stop mattering.

They still matter a lot. Runtime behavior, type systems, package ecosystems, performance, security, deployment, and maintainability are still real.

But from the human interface point of view, the higher-level artifact is no longer only the code.

It is the spec, the architecture, the acceptance criteria, the review checklist, and the reusable instructions that tell agents what is expected and how their work will be checked.

Spec-Driven Development Starts With The Knowns

Software teams have always planned.

Waterfall planned heavily before implementation. Agile reacted against that and moved planning closer to delivery. User stories, architecture decision records, RFCs, tickets, acceptance criteria, and sprint planning all came from the same basic problem:

Software is too expensive to build by accident.

Spec-driven development starts with the knowns.

That gives it part of the strength of waterfall: precision, completeness where completeness is possible, and a serious attempt to make expectations explicit before expensive work starts.

But it should not inherit waterfall’s weakness.

The spec is not written in stone upfront.

It is a living document.

That is where it takes the useful part of agile. You do not need to describe the full final system before the first useful implementation happens. You describe what is known now, implement against that, learn, and extend the spec for the next feature, the next workflow, the next constraint, or the next architectural decision.

The important difference is that iteration does not mean guessing.

Each iteration should still make its knowns explicit before implementation starts.

Vibe coding is one symptom. It works for exploration, but it breaks as a delivery model when the human expects the AI to know business intent, architecture rules, security constraints, and product expectations that were never provided. Spec-driven development changes the question from “can the AI guess enough?” to “did we give the AI the right information, boundaries, and checks?”

Planning used to align humans with humans.

Spec-driven development must align humans with humans and humans with AI.

Human Work Moves Up A Level

Once the knowns are explicit, the human role changes.

The most valuable human work moves closer to purpose, business pressure, architecture, and risk.

The work becomes explaining what outcome matters, where the tradeoffs are, and how the existing system behaves. It also means making boundaries, failure modes, and delivery constraints explicit: maintainability, operations, compliance, ownership, and risk.

That does not make engineering or architecture judgment less important.

It makes that thinking more important.

If you only measure your value by how many lines you personally write, AI feels like a threat. If you measure your value by how well you turn business pressure into robust systems, AI becomes leverage.

The human job is no longer only to produce the implementation.

The human job is to make the expectation clear enough that implementation can be delegated, checked, explained, and improved.

The better humans define purpose, boundaries, and risks, the better AI can handle the workflow: write the spec, plan the work, implement scoped tickets, and review the result.

Good Specs Make Expectations Executable

The obvious question comes quickly: how do I get to a good spec?

That was hard for me.

My first real contact with spec-driven development came through Kiro specs. Later I looked at GitHub Spec Kit, Tessl’s spec-driven development workflow, and similar approaches. Conceptually, I liked the direction immediately: requirements, design, tasks, implementation, review.

But in practice, it still required too much manual human work.

Too much writing. Too many manual steps. Too much structure that busy teams will not consistently fill with enough quality under delivery pressure.

Those tools are valuable from a conceptual and technical perspective. But for enterprise delivery, the workflow has to be feasible. If it depends on humans manually writing perfect specs, tickets, and review notes every time, it will fail.

That is why I think the spec workflow itself must be AI-assisted and automated.

The spec is the main place where human judgment belongs, but the AI should help create it, challenge it, repair it, and review it. Otherwise humans still have to write specs, repair specs, collect context, split plans into tickets, police scope, compare implementation against intent, and keep teams aligned.

That is exactly the work that becomes inconsistent when people are tired, rushed, or under delivery pressure.

This is why I created sebastianwessel/skills: not to replace senior judgment, but to move senior judgment into reusable context, gates, and instructions.

Skills Turn Judgment Into Guidance

My approach is not to ask an AI to “write a good spec” and hope for the best.

The skill gives the AI the operating model for writing and maintaining specs. It tells the AI what good means: concise, gap-free, traceable, implementation-ready, and explicit enough that agents do not have to guess.

The human provides the intent, context, constraints, and decisions.

The AI uses the skill to turn that input into precise specs, ask focused questions, find gaps, structure the material, maintain consistency, and keep the source of truth clean as the project evolves.

That is the division of work I want: humans decide what matters, and AI handles the disciplined spec work required to make those expectations usable.

Writing Specs Is An Iterative Loop

Writing a spec is not one big document dump.

It is an iterative loop from top to bottom.

I start with the overall purpose: what should exist, why it matters, who it is for, which business outcome it supports, and which constraints are already known.

Then I move down into features, workflows, boundaries, interfaces, data, failure modes, security, operations, and acceptance criteria.

At each level, I do not use the AI only as a task fulfiller.

I use it as a sparring partner. I want it to challenge my proposal, push back on weak assumptions, and tell me where my idea is not precise enough yet:

Where are the gaps?
What would you challenge in this proposal?
Which unhappy paths are missing?
What should happen when this step fails?
Which interfaces or dependencies are unclear?
Which risks would you expect in production?

That is where AI is extremely useful. It can hold a larger context than a human can comfortably keep in mind. It can follow dependency trees, interfaces, flows, and edge cases. It can repeatedly ask the boring questions humans skip when they are tired.

This is what my spec-architect skill is built around. A spec is not ready because it sounds plausible. It is ready when an agent can implement from it without deciding behavior, interfaces, failures, security, data, recovery, release, observability, or tests.

Reviewing Specs Is Also An AI Workflow

I do not want to manually read every spec document line by line.

That is not the best use of human attention.

Instead, I ask the AI precise questions. Explain this flow step by step. Show me the risky assumptions. Visualize the dependencies. Find unclear interfaces. Suggest simpler patterns. Look for scaling and performance issues. Check whether the acceptance criteria actually prove the behavior.

Basically, I ask the AI the questions I would ask during a manual review.

The important difference is that those questions can become reusable: a skill, a checklist, a Markdown file, a review gate, or a project convention.

That is the leverage.

Human review moves from reading everything manually to improving the review system.

Planning Makes Work Delegatable

A spec is not yet a delivery plan.

Planning turns the source of truth into work that can be delegated.

This is what teams have tried to do for years with Jira tickets, refinements, planning sessions, and acceptance criteria. The intention was right: create work items with enough context that implementation becomes predictable.

But it was always expensive.

Good tickets require real context: acceptance criteria, boundaries, dependencies, and enough detail to reduce interpretation without becoming impossible to maintain.

In many teams, that never worked consistently.

Spec-driven planning can finally achieve what agile planning tried to achieve because the planning can be generated from the spec.

The human effort moves into spec creation, where it belongs.

That is a better place for it. The spec is the long-lived source of truth. Plans and tickets are temporary: they orchestrate and track a short implementation period.

My spec-implementation-planner skill exists for that step.

The planner should slice work horizontally first to create foundations and interfaces for safe parallel work. Then it should create vertical tickets that are small, isolated, and end-to-end useful. Each slice should move the system toward a working feature, not just produce disconnected technical fragments.

The core of planning is enablement: make autonomous implementation safe, parallel, traceable, and focused on working outcomes.

If a ticket cannot be filled from the approved spec, the answer is not to let the implementer guess.

The answer is to go back to the spec.

Implementation Should Not Invent

Implementation is where AI can move very fast.

That speed is only useful when the work is isolated and precise.

An implementation ticket should be a focused piece of work with defined scope, defined interfaces, clear acceptance criteria, and no room for interpretation. The implementer should know what can be read, what can be changed, which dependencies must be ready, and how success will be verified.

That is why this works so well with test-driven development.

Explicit interfaces and expectations make it much easier to write the tests first, implement against them, and verify that the behavior matches the ticket. The AI can iterate inside a narrow boundary instead of roaming through the system.

My spec-ticket-implementation skill is intentionally strict: implement one approved ticket, respect read and write scope, check dependencies, cover happy and unhappy paths, verify acceptance criteria, and stop when behavior is missing.

An AI implementer should not silently become the person deciding product behavior, architecture, security, and release strategy because the ticket was vague.

If implementation needs a decision that is not in the spec or ticket, that is not implementation work.

That is a spec or plan gap.

The output should include evidence: changed files, tests, verification commands, acceptance coverage, and blockers. That evidence becomes the input for review.

Review Becomes A Trust Pipeline

Review is not one final human checkpoint.

Review happens across the whole pipeline.

Specs are reviewed for gaps and ambiguity. Plans are reviewed for scope, dependencies, and parallel safety. Implementation tickets have preflight and done gates. Final implementation review checks whether the implemented paths still match the approved spec and plan.

That is what spec-implementation-review is for. It compares the implementation path to the specs, verifies there was no drift, checks that the expected behavior was implemented, and makes sure the different ticket slices work together to fulfill the original expectations.

A reviewer can ask AI:

Is the spec itself clear, complete, and free of contradictions?
Does the implementation match the approved spec, plan, and architecture?
Which edge cases, bugs, failure modes, or evidence gaps remain?
Are security, compliance, data, and operational concerns covered?
Can findings route back to a ticket, plan gap, or spec gap?

This is where I would push hard against a common argument.

“I am better than AI at reviewing code” is usually the wrong conclusion.

When a human finds something AI missed, that does not prove the human is generally better at review. It usually proves that the human had context, experience, suspicion, or focus that was not available to the AI in that run.

That is not magic. That is knowledge.

The same is true between humans. An expert sees things a junior developer misses because the expert has more context, patterns, and prior failures in their head. AI is not different in that regard. It needs the right knowledge, the right focus, and the right review instruction.

So the useful question is not “why did the AI miss this?”

The useful question is: “what did I fail to give the AI, and how do I make sure it has that knowledge next time?”

Human intervention in planning, implementation, or review should increasingly prioritize improving the AI capability itself: better specs, better context packages, better reusable prompts, better gates, better tests, and better review skills.

That is where the cost changes.

A human review is expensive, limited, and easy to skip under pressure. A reusable AI-supported check can become part of the regular workflow. It can run more often. It can be improved when it misses something. It can grow with the system.

Over time, the important questions are no longer asked only a few times by tired humans. They become integrated into delivery as a repetitive pipeline that runs again and again.

This is how software becomes better tested, more reliable, and more predictable.

The goal is to make AI trustable: not by hoping, but by giving it context, checks, and feedback loops.

What Works For Me

This is still a young workflow, but a few patterns already work very well for me.

Use different reasoning levels for different jobs. I like medium reasoning for creating and extending specs because it keeps momentum. For review, cleanup, refactoring, gap finding, and quality gates, I prefer high reasoning. That is where I want the AI to slow down, follow paths step by step, judge honestly, and not optimize for making me happy.

Always ask questions. Ask the AI to explain flows, find gaps, challenge your proposal, inspect unhappy paths, check scaling limits, and self-reflect on weak evidence. I explicitly ask it to reason, judge, disagree when needed, and be honest about uncertainty.

Use independent models when possible. Write with one model family and review with another. My personal experience is that GPT-5.5-style models write stronger specs for this kind of work because they follow instructions more strictly and are less likely to optimize for pleasing the user. Claude models often feel more eager to start implementing or guess missing intent. That can be useful in other modes, but for specs I want discipline first.

Persist what works. If a review question is useful once, turn it into a checklist, skill, Markdown file, or project convention. Do not rely on memory. The point is to make the AI better next time.

Stop asking: how do I improve this implementation? Start asking: how do I improve the AI system that creates, checks, and improves implementations?

That mindset shift is the whole game.

Open Questions To Handle

I do not think spec-driven development breaks when the workflow above is followed seriously.

But there are still operating questions teams have to solve.

Team spec creation. How do multiple people contribute without losing alignment? Who owns the source of truth? How do product, architecture, security, and engineering decisions get merged without becoming a noisy document pile?
Project knowledge base. Specs should grow beyond feature documents. They should include business context, project history, architecture rules, operational decisions, security assumptions, and the reusable knowledge agents need to work well.
Versioning. Git is a good foundation because specs should live close to the system and be reviewed like other source artifacts. But teams still need better ways to quickly understand how expectations changed over time, which decisions moved, and which implementation plans were affected.

Those questions are not reasons to avoid spec-driven development.

They are reasons to treat specs as a serious delivery asset.

Conclusion

The enterprise impact is not “developers wrote more documents”.

That would be a terrible result.

The impact is delivery that can be explained before, during, and after implementation. Specs make expectations, scope, and tradeoffs visible earlier. They make work clearer, decisions more explicit, and AI adoption less like a pile of experiments and more like a delivery capability an organization can understand.

But the real game changer is the economics.

Spec-driven development makes speed and quality move together instead of fighting each other. Implementation gets faster, but quality control also becomes more repeatable. Reviews, checks, gap analysis, acceptance validation, and proof can run as an autonomous pipeline instead of depending only on scarce human review time.

The same quality questions can be asked again and again. The review system can improve when it misses something. The knowledge can be reused across teams and projects. Quality control becomes cheaper, more frequent, and more consistent over time.

Personally, I have fully switched my personal and open-source work to spec-driven development.

That shift was honestly mind-blowing.

The real unlock was not “AI writes code now”. It was changing my reaction when the AI got something wrong.

Instead of blaming the AI, I started asking: what did I fail to give it? Which context was missing? Which expectation was unclear? Which review question should become reusable? Which part of the workflow needs to improve so this does not happen again?

I get better results than I would by manually coding everything myself, and I deliver faster. Work that previously would have taken me months, sometimes years, can now move toward a production-ready product in a couple of focused two-week sprints when the spec, plan, implementation, and review loop is running well.

More importantly, I can work on multiple projects in parallel. I do not need to personally write every line of code anymore. I improve the specs, verify the results, improve the feedback loops, and improve the skills. That improves the AI system, and the better AI system improves delivery again.

That is the compounding effect.

This is why I think judging AI only by today’s visible failures is strategically wrong.

The model you use today was trained, tested, packaged, and shipped before today. The tools around it were designed before today’s workflows became obvious. Compare AI-assisted development one year ago with what is possible now, then imagine the next 3, 6, or 12 months.

That matters because the product you start building today will go live in the future, not in the world you planned it in. By the time it reaches users, the market, tooling, model capabilities, and expectations around AI-assisted delivery may already have moved.

So the pressure is not only to deliver faster. The pressure is to deliver faster while improving quality.

Spec-driven development is the workflow that makes that possible for me: move faster, keep the source of truth clear, automate more quality control, and improve the AI system while the product evolves.

Teams that only ask whether today’s AI can replace yesterday’s workflow are already late.

So here is the mindset shift I would leave you with.

Whenever you catch yourself doing manual work, ask why this specific work is still manual.

What outcome is it supposed to create? What judgment, context, evidence, or constraint is hidden inside it? Why can AI not do it on your behalf yet? What would AI need to know, check, or prove to take it over safely? And what can you do now to make that possible next time?

That is the real work: turning repeated human effort into explicit knowledge, reusable instructions, review gates, and automated feedback loops.

Bad specs are bureaucracy.

Good specs are leverage.

And if we want AI agents to do more real engineering work, we need to become much better at giving them the kind of work they can actually do well.