6.2 KiB
Project Specification Template
Copy this file into your project root as
PROJECT-SPEC.md. Fill out each section. The more specific you are, the better the agent performs. This is the single source of truth — the agent reads it every iteration.
1. Project Overview
What are we building?
Why does it matter?
Success criteria
- Criterion 1
- Criterion 2
- Criterion 3
2. Technical Foundation
Tech stack
- Language:
- Framework:
- Database:
- Build system:
- Test framework:
- Package manager:
Project structure
project/
├── src/
├── tests/
├── docs/
└── ...
Build & test commands
# Build
npm run build
# Test
npm test
# Lint
npm run lint
Coding standards
3. Requirements
Functional Requirements
List every feature the system must have. Each should be independently testable. Use the format: FR-NNN: Description — Acceptance criteria
FR-001: [Feature Name]
Description: What it does.
Acceptance criteria:
- Given X, when Y, then Z
- Edge case: ...
- Error case: ...
FR-002: [Feature Name]
Description: What it does.
Acceptance criteria:
- ...
Non-Functional Requirements
NFR-001: Performance
- Response time < X ms for Y operation
- Handles Z concurrent users
NFR-002: Security
- No secrets in source code
- Input validation on all user inputs
NFR-003: Testing
- Minimum 80% code coverage
- All happy paths tested
- All error paths tested
4. Data Model
Entities
Entity: User
- id: UUID (primary key)
- name: string (required, max 100 chars)
- email: string (required, unique, valid email)
- created_at: timestamp (auto-set)
Relationships
Sample Data
5. API / Interface Design
Endpoints / Commands / UI Screens
Input/Output Examples
6. Architecture Decisions
Constraints
Four categories of constraints. Fill all four — even if one is just "none in this category." The ESCALATE category is critical for autonomous agents: it defines when to stop and ask rather than decide.
- MUST: ... (non-negotiable requirements)
- MUST NOT: ... (explicit prohibitions — prevents entire failure categories)
- PREFER: ... (soft preferences when multiple valid approaches exist)
- ESCALATE: ... (conditions where the agent must stop and ask the human rather than decide autonomously)
Example ESCALATE entries:
- If a task requires deleting production data → ask first
- If acceptance criteria are ambiguous and the spec doesn't resolve it → ask before implementing
- If a dependency needs to be added that wasn't in the tech stack → confirm before installing
- If the task conflicts with another requirement or constraint → flag the conflict and wait
- If the agent discovers a requirement gap not covered by any FR-NNN → don't fill the gap silently
Dependencies
Known Challenges
7. Phasing (Optional)
If the project is large, break it into phases. Each phase should be independently deployable and testable.
Phase 1: Foundation
- Task A
- Task B
Phase 2: Core Features
- Task C
- Task D
Phase 3: Polish
- Task E
- Task F
8. Reference Materials
External docs
Existing code to learn from
Anti-patterns
9. Evaluation Design
How do you know the agent's output is good? Not "does it look reasonable?" but "can you prove it's correct, measurably and consistently?"
Evaluation design is the only thing standing between AI output you can use as-is and AI output that requires 40 minutes of cleanup. Build evals before you need them — especially before model updates that could silently change behavior.
Test cases
TC-001: [Test Name]
Input: What you feed to the agent
Expected output: What you should get back (specific, verifiable)
Verification: How to check programmatically (command, assertion, manual checklist)
TC-002: [Test Name]
Input: ...
Expected output: ...
Verification: ...
Verification steps
# Example: verify the build passes
npm test
# Example: check that specific files exist
ls -la src/new-feature/
# Example: verify no regressions
npm run lint
Run-after-update checklist
- Run all test cases (TC-001 through TC-00N)
- Compare outputs to expected baselines
- Check for regression in failure modes (see anti-patterns)
- Verify ESCALATE triggers still work correctly
Regression baselines
{
"tc_001_baseline": "...",
"tc_002_baseline": "..."
}