12 KiB

Raw Blame History

Spec Creation Guide — The Interview Protocol

The spec is the most important document in the entire harness. A bad spec produces bad code, no matter how good the agent is. This guide teaches you how to create a great spec through structured conversation.

Why Interview-Based Spec Creation?

Most agent harness guides say "write a good spec" and move on. That's like telling someone "just write good code." The spec requires two kinds of knowledge:

Domain knowledge — what the human knows (goals, constraints, edge cases, things they've tried)
Technical knowledge — what the agent/engineer knows (architecture patterns, tooling, testing strategies)

Neither side has the full picture. The interview process brings them together.

The Anti-Pattern: Agent-Written Specs

If you ask an agent to "analyze this codebase and write a spec," you get a description of what exists, not a plan for what should exist. The agent can't know:

Why you're building this
What you've tried that didn't work
What tradeoffs you're willing to make
What "done" looks like to you

The Anti-Pattern: Human-Only Specs

If the human writes the spec alone, you get:

Vague acceptance criteria ("it should be fast")
Missing technical details (no build commands, no test strategy)
Implied knowledge that the agent can't access
Gaps where the human assumed things were obvious

The Interview Protocol

Phase 1: Vision & Context (5-10 minutes)

Start broad. Understand the "why" before the "what."

Questions to ask:

"What are we building, in one sentence?"
- Forces clarity. If they can't say it in one sentence, the scope isn't clear yet.
- Good: "A CLI toolkit for interacting with DocuSign APIs without a proxy server."
- Bad: "Something to help with DocuSign stuff."
"Who is this for?"
- The user? Other developers? An automated system?
- This shapes API design, error messages, documentation needs.
"Why now? What's the trigger?"
- Understanding urgency and motivation reveals hidden requirements.
- "I'm tired of copying tokens manually" → auto-refresh is a core requirement, not a nice-to-have.
"What does 'done' look like? How will you know it's working?"
- Push for measurable criteria, not feelings.
- "It works" → "I can run cli auth and get a valid token without opening a browser."
"What have you tried before? What didn't work?"
- This is GOLD. Anti-patterns save agents hours of wasted effort.
- "Node.js fetch sends headers that break SpringCM" → use curl instead.
"What options did you consider and reject?"
- Capture plausible alternatives that should stay rejected unless something material changes.
- Example: "We considered WebSockets, but polling is enough for MVP and much simpler to operate."

What you're listening for:

Unstated assumptions ("obviously it needs to...")
Emotional language (frustration = high-priority requirement)
Scope creep indicators ("and eventually it could also...")

Phase 2: Requirements Extraction (10-15 minutes)

Now go feature by feature. For each feature:

The requirement loop:

"Walk me through how you'd use this feature."
- Get the happy path first. Concrete scenario, not abstract description.
- "I'd run cli templates list and see my 20 most recent templates with names and IDs."
"What could go wrong?"
- Error cases, edge cases, permissions issues.
- "The token could be expired." → auto-refresh requirement.
- "The account might not have that API enabled." → graceful error message.
"What's the input? What's the output?"
- Be specific about formats, fields, defaults.
- "Input: template ID. Output: JSON with name, ID, folder, page count, created date."
"How would you test this?"
- If they can describe a test, you have an acceptance criterion.
- "I'd run it and check that I get at least one template back with a valid ID."
"Is this a must-have or nice-to-have?"
- Prioritization prevents scope explosion.
- Phase 1 = must-haves. Phase 2+ = nice-to-haves.

Pro tip: Number requirements as you go (FR-001, FR-002...). It creates shared language for the rest of the project.

Phase 3: Technical Discovery (10-15 minutes)

This is where the engineer/agent fills in what the human might not think to specify.

Questions to explore together:

Tech stack confirmation
- "You're using TypeScript with npm workspaces — should we keep that pattern?"
- Don't assume. The human might want to change direction.
Existing code patterns
- Read the codebase. Identify patterns already in use.
- "I see you're using Commander.js for CLI parsing — should all packages follow that?"
- "Your auth module uses JWT with RSA keys — should new packages share that?"
Build and test infrastructure
- "What are the build commands? What test framework? What's the CI/CD setup?"
- If there's no test framework, that's a Phase 0 task.
Data model and persistence
- "Where does data live? Files? Database? Environment variables?"
- "How do packages share configuration?" (e.g., monorepo root .env)
Deployment and environment
- "Is this demo-only or does it need production support?"
- "What environments exist?" (demo, staging, production)
Dependencies and external services
- "What APIs are involved? What are their quirks?"
- "Any rate limits, authentication requirements, or known issues?"

What the engineer contributes:

Suggest architecture patterns the human hasn't considered
Identify missing infrastructure (test framework, linting, CI)
Spot potential issues early (circular dependencies, shared state)
Propose phasing based on technical dependencies

Phase 4: Constraint Mapping (5 minutes)

Explicitly capture the guardrails.

Three categories:

MUST — Non-negotiable requirements
- "MUST use curl for HTTP calls (fetch breaks SpringCM)"
- "MUST store tokens in .env file"
MUST NOT — Explicit prohibitions
- "MUST NOT commit secrets to git"
- "MUST NOT use React for the frontend"
PREFER — Soft preferences
- "PREFER ES modules over CommonJS"
- "PREFER shared utilities over code duplication"

Why this matters: Agents follow explicit constraints better than implied ones. A MUST NOT prevents entire categories of mistakes.

Phase 5: Spec Assembly (Agent's job)

After the interview, the agent assembles the spec:

Fill the PROJECT-SPEC.md template with interview answers
Add technical details discovered from code review
Write acceptance criteria from the requirement conversations
Propose phasing based on dependencies
Include anti-patterns from "what didn't work" answers
Capture rejected approaches from the interview in the spec when they are likely to prevent future drift
Present to human for review

The review conversation:

Read through each section together
Human corrects misunderstandings
Agent asks clarifying questions on gaps
Iterate until the human says "yes, that's what I want"

Phase 6: Self-Containment Test (5 minutes)

The critical test: Can the spec be solved without the agent needing to fetch information not included in it?

This is Toby Lütke's insight: Can you state a problem with enough context that the task is plausibly solvable without the agent going out and getting more information?

The test — rewrite the spec as if:

The reader has never seen your project before
The reader doesn't know your coding conventions or style
The reader has no access to information you don't include
The reader will stop and do nothing if anything is ambiguous

The checklist:

Every acronym is defined on first use
File paths referenced actually exist and are correct
External dependencies have versions pinned or install instructions included
Domain-specific terms are explained (not everyone knows what "JWT" or "FTS" means)
The agent can find all referenced files without searching
If removing any sentence would cause the agent to make mistakes, the spec isn't self-contained yet

The failure mode this catches: Agents fill gaps with statistical plausibility — they guess in ways that are often subtly wrong. A spec that relies on shared context (even 5 minutes of prior conversation) will produce outputs that look right but aren't.

If the spec fails the test: Add the missing context. If you can't add it (too much to document), add an ESCALATE constraint: "If you encounter information not covered by this spec, do not assume — ask the human."

Spec Quality Checklist

Before handing a spec to agents, verify:

Completeness

Every feature has numbered acceptance criteria (FR-NNN)
Data model is defined with types and constraints
Build and test commands are specified and work
Anti-patterns section exists with real examples
Phasing is defined with dependencies noted
All four constraint categories are filled (MUST / MUST NOT / PREFER / ESCALATE)
Evaluation design section exists with test cases and verification steps
Rejected approaches are captured when they would prevent repeated bad decisions

Clarity

A stranger could read this and understand what to build
No ambiguous words ("fast", "nice", "good") — use numbers
Input/output examples for key operations
Error cases are explicitly described

Testability

Every acceptance criterion can be verified by running code
Sample data or fixtures are provided
Performance criteria have specific thresholds
"Done" is objectively measurable

Feasibility

Tech stack is proven for this type of project
External dependencies are accessible (API keys, permissions)
Scope fits the timeline (phasing handles overflow)
Known challenges are documented with mitigation strategies

Self-Containment

A stranger could solve this without asking follow-up questions
No domain-specific terms used without definition
All file paths, commands, and references are correct
ESCALATE constraints cover situations where spec is ambiguous

Common Interview Mistakes

1. Leading the witness

Bad: "You probably want auto-refresh, right?" Good: "What happens when the token expires mid-session?"

2. Accepting vague answers

Bad: Human: "It should handle errors well." Agent: "Got it." Good: "Can you give me an example of an error? What should the user see?"

3. Skipping the 'why'

Bad: Jumping straight to features. Good: Understanding context first — it changes how you interpret every requirement.

4. Over-engineering the spec

Bad: 50-page spec with UML diagrams for a CLI tool. Good: Enough detail for an agent to work autonomously, no more.

5. Forgetting anti-patterns

Bad: Only describing what TO do. Good: Explicitly listing what NOT to do — saves agents from repeating your mistakes.

Template: Interview Notes

Use this to capture notes during the interview before assembling the spec:

# Interview Notes — [Project Name]
**Date:** YYYY-MM-DD
**Participants:** [Human], [Agent]

## Vision
- One-liner:
- Target user:
- Trigger/motivation:
- Success criteria:

## Features (raw notes)
1. Feature name — description — happy path — error cases — priority
2. ...

## Technical Context
- Existing stack:
- Patterns to follow:
- Patterns to avoid:
- Build/test commands:

## Constraints
- MUST:
- MUST NOT:
- PREFER:

## Anti-patterns (things that didn't work)
1.
2.

## Rejected Approaches
1. [Option] — rejected because [...] — scope: [project-only / reusable lesson]
2.

## Open Questions
1.
2.

This guide is a living document. Update it as you learn what works.

12 KiB Raw Blame History