322 lines
11 KiB
Markdown
322 lines
11 KiB
Markdown
# Spec Creation Guide — The Interview Protocol
|
|
|
|
> The spec is the most important document in the entire harness.
|
|
> A bad spec produces bad code, no matter how good the agent is.
|
|
> This guide teaches you how to create a great spec through structured conversation.
|
|
|
|
---
|
|
|
|
## Why Interview-Based Spec Creation?
|
|
|
|
Most agent harness guides say "write a good spec" and move on. That's like telling someone "just write good code." The spec requires **two kinds of knowledge**:
|
|
|
|
1. **Domain knowledge** — what the human knows (goals, constraints, edge cases, things they've tried)
|
|
2. **Technical knowledge** — what the agent/engineer knows (architecture patterns, tooling, testing strategies)
|
|
|
|
Neither side has the full picture. The interview process brings them together.
|
|
|
|
### The Anti-Pattern: Agent-Written Specs
|
|
|
|
If you ask an agent to "analyze this codebase and write a spec," you get a description of **what exists**, not a plan for **what should exist**. The agent can't know:
|
|
- Why you're building this
|
|
- What you've tried that didn't work
|
|
- What tradeoffs you're willing to make
|
|
- What "done" looks like to you
|
|
|
|
### The Anti-Pattern: Human-Only Specs
|
|
|
|
If the human writes the spec alone, you get:
|
|
- Vague acceptance criteria ("it should be fast")
|
|
- Missing technical details (no build commands, no test strategy)
|
|
- Implied knowledge that the agent can't access
|
|
- Gaps where the human assumed things were obvious
|
|
|
|
---
|
|
|
|
## The Interview Protocol
|
|
|
|
### Phase 1: Vision & Context (5-10 minutes)
|
|
|
|
Start broad. Understand the "why" before the "what."
|
|
|
|
**Questions to ask:**
|
|
|
|
1. **"What are we building, in one sentence?"**
|
|
- Forces clarity. If they can't say it in one sentence, the scope isn't clear yet.
|
|
- Good: "A CLI toolkit for interacting with DocuSign APIs without a proxy server."
|
|
- Bad: "Something to help with DocuSign stuff."
|
|
|
|
2. **"Who is this for?"**
|
|
- The user? Other developers? An automated system?
|
|
- This shapes API design, error messages, documentation needs.
|
|
|
|
3. **"Why now? What's the trigger?"**
|
|
- Understanding urgency and motivation reveals hidden requirements.
|
|
- "I'm tired of copying tokens manually" → auto-refresh is a core requirement, not a nice-to-have.
|
|
|
|
4. **"What does 'done' look like? How will you know it's working?"**
|
|
- Push for measurable criteria, not feelings.
|
|
- "It works" → "I can run `cli auth` and get a valid token without opening a browser."
|
|
|
|
5. **"What have you tried before? What didn't work?"**
|
|
- This is GOLD. Anti-patterns save agents hours of wasted effort.
|
|
- "Node.js fetch sends headers that break SpringCM" → use curl instead.
|
|
|
|
**What you're listening for:**
|
|
- Unstated assumptions ("obviously it needs to...")
|
|
- Emotional language (frustration = high-priority requirement)
|
|
- Scope creep indicators ("and eventually it could also...")
|
|
|
|
---
|
|
|
|
### Phase 2: Requirements Extraction (10-15 minutes)
|
|
|
|
Now go feature by feature. For each feature:
|
|
|
|
**The requirement loop:**
|
|
|
|
1. **"Walk me through how you'd use this feature."**
|
|
- Get the happy path first. Concrete scenario, not abstract description.
|
|
- "I'd run `cli templates list` and see my 20 most recent templates with names and IDs."
|
|
|
|
2. **"What could go wrong?"**
|
|
- Error cases, edge cases, permissions issues.
|
|
- "The token could be expired." → auto-refresh requirement.
|
|
- "The account might not have that API enabled." → graceful error message.
|
|
|
|
3. **"What's the input? What's the output?"**
|
|
- Be specific about formats, fields, defaults.
|
|
- "Input: template ID. Output: JSON with name, ID, folder, page count, created date."
|
|
|
|
4. **"How would you test this?"**
|
|
- If they can describe a test, you have an acceptance criterion.
|
|
- "I'd run it and check that I get at least one template back with a valid ID."
|
|
|
|
5. **"Is this a must-have or nice-to-have?"**
|
|
- Prioritization prevents scope explosion.
|
|
- Phase 1 = must-haves. Phase 2+ = nice-to-haves.
|
|
|
|
**Pro tip:** Number requirements as you go (FR-001, FR-002...). It creates shared language for the rest of the project.
|
|
|
|
---
|
|
|
|
### Phase 3: Technical Discovery (10-15 minutes)
|
|
|
|
This is where the engineer/agent fills in what the human might not think to specify.
|
|
|
|
**Questions to explore together:**
|
|
|
|
1. **Tech stack confirmation**
|
|
- "You're using TypeScript with npm workspaces — should we keep that pattern?"
|
|
- Don't assume. The human might want to change direction.
|
|
|
|
2. **Existing code patterns**
|
|
- Read the codebase. Identify patterns already in use.
|
|
- "I see you're using Commander.js for CLI parsing — should all packages follow that?"
|
|
- "Your auth module uses JWT with RSA keys — should new packages share that?"
|
|
|
|
3. **Build and test infrastructure**
|
|
- "What are the build commands? What test framework? What's the CI/CD setup?"
|
|
- If there's no test framework, that's a Phase 0 task.
|
|
|
|
4. **Data model and persistence**
|
|
- "Where does data live? Files? Database? Environment variables?"
|
|
- "How do packages share configuration?" (e.g., monorepo root `.env`)
|
|
|
|
5. **Deployment and environment**
|
|
- "Is this demo-only or does it need production support?"
|
|
- "What environments exist?" (demo, staging, production)
|
|
|
|
6. **Dependencies and external services**
|
|
- "What APIs are involved? What are their quirks?"
|
|
- "Any rate limits, authentication requirements, or known issues?"
|
|
|
|
**What the engineer contributes:**
|
|
- Suggest architecture patterns the human hasn't considered
|
|
- Identify missing infrastructure (test framework, linting, CI)
|
|
- Spot potential issues early (circular dependencies, shared state)
|
|
- Propose phasing based on technical dependencies
|
|
|
|
---
|
|
|
|
### Phase 4: Constraint Mapping (5 minutes)
|
|
|
|
Explicitly capture the guardrails.
|
|
|
|
**Three categories:**
|
|
|
|
1. **MUST** — Non-negotiable requirements
|
|
- "MUST use curl for HTTP calls (fetch breaks SpringCM)"
|
|
- "MUST store tokens in .env file"
|
|
|
|
2. **MUST NOT** — Explicit prohibitions
|
|
- "MUST NOT commit secrets to git"
|
|
- "MUST NOT use React for the frontend"
|
|
|
|
3. **PREFER** — Soft preferences
|
|
- "PREFER ES modules over CommonJS"
|
|
- "PREFER shared utilities over code duplication"
|
|
|
|
**Why this matters:** Agents follow explicit constraints better than implied ones. A MUST NOT prevents entire categories of mistakes.
|
|
|
|
---
|
|
|
|
### Phase 5: Spec Assembly (Agent's job)
|
|
|
|
After the interview, the agent assembles the spec:
|
|
|
|
1. **Fill the PROJECT-SPEC.md template** with interview answers
|
|
2. **Add technical details** discovered from code review
|
|
3. **Write acceptance criteria** from the requirement conversations
|
|
4. **Propose phasing** based on dependencies
|
|
5. **Include anti-patterns** from "what didn't work" answers
|
|
6. **Present to human for review**
|
|
|
|
**The review conversation:**
|
|
- Read through each section together
|
|
- Human corrects misunderstandings
|
|
- Agent asks clarifying questions on gaps
|
|
- Iterate until the human says "yes, that's what I want"
|
|
|
|
---
|
|
|
|
### Phase 6: Self-Containment Test (5 minutes)
|
|
|
|
> **The critical test:** Can the spec be solved without the agent needing
|
|
> to fetch information not included in it?
|
|
>
|
|
> This is Toby Lütke's insight: *Can you state a problem with enough
|
|
> context that the task is plausibly solvable without the agent going
|
|
> out and getting more information?*
|
|
|
|
**The test — rewrite the spec as if:**
|
|
1. The reader has never seen your project before
|
|
2. The reader doesn't know your coding conventions or style
|
|
3. The reader has no access to information you don't include
|
|
4. The reader will stop and do nothing if anything is ambiguous
|
|
|
|
**The checklist:**
|
|
- [ ] Every acronym is defined on first use
|
|
- [ ] File paths referenced actually exist and are correct
|
|
- [ ] External dependencies have versions pinned or install instructions included
|
|
- [ ] Domain-specific terms are explained (not everyone knows what "JWT" or "FTS" means)
|
|
- [ ] The agent can find all referenced files without searching
|
|
- [ ] If removing any sentence would cause the agent to make mistakes, the spec isn't self-contained yet
|
|
|
|
**The failure mode this catches:**
|
|
Agents fill gaps with statistical plausibility — they guess in ways that are
|
|
often subtly wrong. A spec that relies on shared context (even 5 minutes of
|
|
prior conversation) will produce outputs that look right but aren't.
|
|
|
|
**If the spec fails the test:** Add the missing context. If you can't add it
|
|
(too much to document), add an ESCALATE constraint: "If you encounter
|
|
information not covered by this spec, do not assume — ask the human."
|
|
|
|
---
|
|
|
|
## Spec Quality Checklist
|
|
|
|
Before handing a spec to agents, verify:
|
|
|
|
### Completeness
|
|
- [ ] Every feature has numbered acceptance criteria (FR-NNN)
|
|
- [ ] Data model is defined with types and constraints
|
|
- [ ] Build and test commands are specified and work
|
|
- [ ] Anti-patterns section exists with real examples
|
|
- [ ] Phasing is defined with dependencies noted
|
|
- [ ] All four constraint categories are filled (MUST / MUST NOT / PREFER / ESCALATE)
|
|
- [ ] Evaluation design section exists with test cases and verification steps
|
|
|
|
### Clarity
|
|
- [ ] A stranger could read this and understand what to build
|
|
- [ ] No ambiguous words ("fast", "nice", "good") — use numbers
|
|
- [ ] Input/output examples for key operations
|
|
- [ ] Error cases are explicitly described
|
|
|
|
### Testability
|
|
- [ ] Every acceptance criterion can be verified by running code
|
|
- [ ] Sample data or fixtures are provided
|
|
- [ ] Performance criteria have specific thresholds
|
|
- [ ] "Done" is objectively measurable
|
|
|
|
### Feasibility
|
|
- [ ] Tech stack is proven for this type of project
|
|
- [ ] External dependencies are accessible (API keys, permissions)
|
|
- [ ] Scope fits the timeline (phasing handles overflow)
|
|
- [ ] Known challenges are documented with mitigation strategies
|
|
|
|
### Self-Containment
|
|
- [ ] A stranger could solve this without asking follow-up questions
|
|
- [ ] No domain-specific terms used without definition
|
|
- [ ] All file paths, commands, and references are correct
|
|
- [ ] ESCALATE constraints cover situations where spec is ambiguous
|
|
|
|
---
|
|
|
|
## Common Interview Mistakes
|
|
|
|
### 1. Leading the witness
|
|
**Bad:** "You probably want auto-refresh, right?"
|
|
**Good:** "What happens when the token expires mid-session?"
|
|
|
|
### 2. Accepting vague answers
|
|
**Bad:** Human: "It should handle errors well." Agent: "Got it."
|
|
**Good:** "Can you give me an example of an error? What should the user see?"
|
|
|
|
### 3. Skipping the 'why'
|
|
**Bad:** Jumping straight to features.
|
|
**Good:** Understanding context first — it changes how you interpret every requirement.
|
|
|
|
### 4. Over-engineering the spec
|
|
**Bad:** 50-page spec with UML diagrams for a CLI tool.
|
|
**Good:** Enough detail for an agent to work autonomously, no more.
|
|
|
|
### 5. Forgetting anti-patterns
|
|
**Bad:** Only describing what TO do.
|
|
**Good:** Explicitly listing what NOT to do — saves agents from repeating your mistakes.
|
|
|
|
---
|
|
|
|
## Template: Interview Notes
|
|
|
|
Use this to capture notes during the interview before assembling the spec:
|
|
|
|
```markdown
|
|
# Interview Notes — [Project Name]
|
|
**Date:** YYYY-MM-DD
|
|
**Participants:** [Human], [Agent]
|
|
|
|
## Vision
|
|
- One-liner:
|
|
- Target user:
|
|
- Trigger/motivation:
|
|
- Success criteria:
|
|
|
|
## Features (raw notes)
|
|
1. Feature name — description — happy path — error cases — priority
|
|
2. ...
|
|
|
|
## Technical Context
|
|
- Existing stack:
|
|
- Patterns to follow:
|
|
- Patterns to avoid:
|
|
- Build/test commands:
|
|
|
|
## Constraints
|
|
- MUST:
|
|
- MUST NOT:
|
|
- PREFER:
|
|
|
|
## Anti-patterns (things that didn't work)
|
|
1.
|
|
2.
|
|
|
|
## Open Questions
|
|
1.
|
|
2.
|
|
```
|
|
|
|
---
|
|
|
|
_This guide is a living document. Update it as you learn what works._
|