12 KiB
Spec Creation Guide — The Interview Protocol
The spec is the most important document in the entire harness. A bad spec produces bad code, no matter how good the agent is. This guide teaches you how to create a great spec through structured conversation.
Why Interview-Based Spec Creation?
Most agent harness guides say "write a good spec" and move on. That's like telling someone "just write good code." The spec requires two kinds of knowledge:
- Domain knowledge — what the human knows (goals, constraints, edge cases, things they've tried)
- Technical knowledge — what the agent/engineer knows (architecture patterns, tooling, testing strategies)
Neither side has the full picture. The interview process brings them together.
The Anti-Pattern: Agent-Written Specs
If you ask an agent to "analyze this codebase and write a spec," you get a description of what exists, not a plan for what should exist. The agent can't know:
- Why you're building this
- What you've tried that didn't work
- What tradeoffs you're willing to make
- What "done" looks like to you
The Anti-Pattern: Human-Only Specs
If the human writes the spec alone, you get:
- Vague acceptance criteria ("it should be fast")
- Missing technical details (no build commands, no test strategy)
- Implied knowledge that the agent can't access
- Gaps where the human assumed things were obvious
The Interview Protocol
Phase 1: Vision & Context (5-10 minutes)
Start broad. Understand the "why" before the "what."
Questions to ask:
-
"What are we building, in one sentence?"
- Forces clarity. If they can't say it in one sentence, the scope isn't clear yet.
- Good: "A CLI toolkit for interacting with DocuSign APIs without a proxy server."
- Bad: "Something to help with DocuSign stuff."
-
"Who is this for?"
- The user? Other developers? An automated system?
- This shapes API design, error messages, documentation needs.
-
"Why now? What's the trigger?"
- Understanding urgency and motivation reveals hidden requirements.
- "I'm tired of copying tokens manually" → auto-refresh is a core requirement, not a nice-to-have.
-
"What does 'done' look like? How will you know it's working?"
- Push for measurable criteria, not feelings.
- "It works" → "I can run
cli authand get a valid token without opening a browser."
-
"What have you tried before? What didn't work?"
- This is GOLD. Anti-patterns save agents hours of wasted effort.
- "Node.js fetch sends headers that break SpringCM" → use curl instead.
-
"What options did you consider and reject?"
- Capture plausible alternatives that should stay rejected unless something material changes.
- Example: "We considered WebSockets, but polling is enough for MVP and much simpler to operate."
What you're listening for:
- Unstated assumptions ("obviously it needs to...")
- Emotional language (frustration = high-priority requirement)
- Scope creep indicators ("and eventually it could also...")
Phase 2: Requirements Extraction (10-15 minutes)
Now go feature by feature. For each feature:
The requirement loop:
-
"Walk me through how you'd use this feature."
- Get the happy path first. Concrete scenario, not abstract description.
- "I'd run
cli templates listand see my 20 most recent templates with names and IDs."
-
"What could go wrong?"
- Error cases, edge cases, permissions issues.
- "The token could be expired." → auto-refresh requirement.
- "The account might not have that API enabled." → graceful error message.
-
"What's the input? What's the output?"
- Be specific about formats, fields, defaults.
- "Input: template ID. Output: JSON with name, ID, folder, page count, created date."
-
"How would you test this?"
- If they can describe a test, you have an acceptance criterion.
- "I'd run it and check that I get at least one template back with a valid ID."
-
"Is this a must-have or nice-to-have?"
- Prioritization prevents scope explosion.
- Phase 1 = must-haves. Phase 2+ = nice-to-haves.
Pro tip: Number requirements as you go (FR-001, FR-002...). It creates shared language for the rest of the project.
Phase 3: Technical Discovery (10-15 minutes)
This is where the engineer/agent fills in what the human might not think to specify.
Questions to explore together:
-
Tech stack confirmation
- "You're using TypeScript with npm workspaces — should we keep that pattern?"
- Don't assume. The human might want to change direction.
-
Existing code patterns
- Read the codebase. Identify patterns already in use.
- "I see you're using Commander.js for CLI parsing — should all packages follow that?"
- "Your auth module uses JWT with RSA keys — should new packages share that?"
-
Build and test infrastructure
- "What are the build commands? What test framework? What's the CI/CD setup?"
- If there's no test framework, that's a Phase 0 task.
-
Data model and persistence
- "Where does data live? Files? Database? Environment variables?"
- "How do packages share configuration?" (e.g., monorepo root
.env)
-
Deployment and environment
- "Is this demo-only or does it need production support?"
- "What environments exist?" (demo, staging, production)
-
Dependencies and external services
- "What APIs are involved? What are their quirks?"
- "Any rate limits, authentication requirements, or known issues?"
What the engineer contributes:
- Suggest architecture patterns the human hasn't considered
- Identify missing infrastructure (test framework, linting, CI)
- Spot potential issues early (circular dependencies, shared state)
- Propose phasing based on technical dependencies
Phase 4: Constraint Mapping (5 minutes)
Explicitly capture the guardrails.
Three categories:
-
MUST — Non-negotiable requirements
- "MUST use curl for HTTP calls (fetch breaks SpringCM)"
- "MUST store tokens in .env file"
-
MUST NOT — Explicit prohibitions
- "MUST NOT commit secrets to git"
- "MUST NOT use React for the frontend"
-
PREFER — Soft preferences
- "PREFER ES modules over CommonJS"
- "PREFER shared utilities over code duplication"
Why this matters: Agents follow explicit constraints better than implied ones. A MUST NOT prevents entire categories of mistakes.
Phase 5: Spec Assembly (Agent's job)
After the interview, the agent assembles the spec:
- Fill the PROJECT-SPEC.md template with interview answers
- Add technical details discovered from code review
- Write acceptance criteria from the requirement conversations
- Propose phasing based on dependencies
- Include anti-patterns from "what didn't work" answers
- Capture rejected approaches from the interview in the spec when they are likely to prevent future drift
- Present to human for review
The review conversation:
- Read through each section together
- Human corrects misunderstandings
- Agent asks clarifying questions on gaps
- Iterate until the human says "yes, that's what I want"
Phase 6: Self-Containment Test (5 minutes)
The critical test: Can the spec be solved without the agent needing to fetch information not included in it?
This is Toby Lütke's insight: Can you state a problem with enough context that the task is plausibly solvable without the agent going out and getting more information?
The test — rewrite the spec as if:
- The reader has never seen your project before
- The reader doesn't know your coding conventions or style
- The reader has no access to information you don't include
- The reader will stop and do nothing if anything is ambiguous
The checklist:
- Every acronym is defined on first use
- File paths referenced actually exist and are correct
- External dependencies have versions pinned or install instructions included
- Domain-specific terms are explained (not everyone knows what "JWT" or "FTS" means)
- The agent can find all referenced files without searching
- If removing any sentence would cause the agent to make mistakes, the spec isn't self-contained yet
The failure mode this catches: Agents fill gaps with statistical plausibility — they guess in ways that are often subtly wrong. A spec that relies on shared context (even 5 minutes of prior conversation) will produce outputs that look right but aren't.
If the spec fails the test: Add the missing context. If you can't add it (too much to document), add an ESCALATE constraint: "If you encounter information not covered by this spec, do not assume — ask the human."
Spec Quality Checklist
Before handing a spec to agents, verify:
Completeness
- Every feature has numbered acceptance criteria (FR-NNN)
- Data model is defined with types and constraints
- Build and test commands are specified and work
- Anti-patterns section exists with real examples
- Phasing is defined with dependencies noted
- All four constraint categories are filled (MUST / MUST NOT / PREFER / ESCALATE)
- Evaluation design section exists with test cases and verification steps
- Rejected approaches are captured when they would prevent repeated bad decisions
Clarity
- A stranger could read this and understand what to build
- No ambiguous words ("fast", "nice", "good") — use numbers
- Input/output examples for key operations
- Error cases are explicitly described
Testability
- Every acceptance criterion can be verified by running code
- Sample data or fixtures are provided
- Performance criteria have specific thresholds
- "Done" is objectively measurable
Feasibility
- Tech stack is proven for this type of project
- External dependencies are accessible (API keys, permissions)
- Scope fits the timeline (phasing handles overflow)
- Known challenges are documented with mitigation strategies
Self-Containment
- A stranger could solve this without asking follow-up questions
- No domain-specific terms used without definition
- All file paths, commands, and references are correct
- ESCALATE constraints cover situations where spec is ambiguous
Common Interview Mistakes
1. Leading the witness
Bad: "You probably want auto-refresh, right?" Good: "What happens when the token expires mid-session?"
2. Accepting vague answers
Bad: Human: "It should handle errors well." Agent: "Got it." Good: "Can you give me an example of an error? What should the user see?"
3. Skipping the 'why'
Bad: Jumping straight to features. Good: Understanding context first — it changes how you interpret every requirement.
4. Over-engineering the spec
Bad: 50-page spec with UML diagrams for a CLI tool. Good: Enough detail for an agent to work autonomously, no more.
5. Forgetting anti-patterns
Bad: Only describing what TO do. Good: Explicitly listing what NOT to do — saves agents from repeating your mistakes.
Template: Interview Notes
Use this to capture notes during the interview before assembling the spec:
# Interview Notes — [Project Name]
**Date:** YYYY-MM-DD
**Participants:** [Human], [Agent]
## Vision
- One-liner:
- Target user:
- Trigger/motivation:
- Success criteria:
## Features (raw notes)
1. Feature name — description — happy path — error cases — priority
2. ...
## Technical Context
- Existing stack:
- Patterns to follow:
- Patterns to avoid:
- Build/test commands:
## Constraints
- MUST:
- MUST NOT:
- PREFER:
## Anti-patterns (things that didn't work)
1.
2.
## Rejected Approaches
1. [Option] — rejected because [...] — scope: [project-only / reusable lesson]
2.
## Open Questions
1.
2.
This guide is a living document. Update it as you learn what works.