agent-harness/PLAN-MANAGEMENT.md

284 lines
7.7 KiB
Markdown

# Plan Management — The Living Document
> The IMPLEMENTATION_PLAN.md is the agent's "working memory" between iterations.
> It's the only file the agent both reads AND writes.
> Managing it well is the difference between a smooth build and a chaotic mess.
---
## What Is the Plan?
The plan is a **task decomposition** of the PROJECT-SPEC.md, created by the agent during the planning phase. It's a checklist of discrete, testable tasks ordered by dependency.
```markdown
# Implementation Plan
## Phase 1: Foundation
- [x] Project scaffolding (monorepo, tsconfig, build scripts)
- [x] SQLite schema and migration system
- [ ] QFX file parser with field extraction ← Agent picks this next
- [ ] CSV import with configurable column mapping
- [ ] Category rules engine
## Phase 2: API Layer
- [ ] REST endpoints for CRUD operations
- [ ] Transaction query with filters
- [ ] Spending aggregation endpoints
```
**Key properties:**
- **Checkboxes** track completion (`[x]` = done, `[ ]` = pending)
- **Order matters** — tasks are listed in dependency order
- **One task per iteration** — the agent picks the first unchecked item
- **Agent updates it** after completing each task
---
## The Plan Lifecycle
### 1. Creation (Planning Phase)
The agent reads PROJECT-SPEC.md and decomposes it:
```
Spec → Agent Analysis → IMPLEMENTATION_PLAN.md
```
**What makes a good decomposition:**
- Each task is **independently testable** (has a clear "done" state)
- Tasks are **small enough** to complete in one iteration (30-60 minutes of agent work)
- Dependencies are respected (can't build API before schema exists)
- Each task maps to one or more acceptance criteria from the spec
**What makes a bad decomposition:**
- Tasks too large ("Build the entire backend")
- Tasks too small ("Create the src directory")
- Missing dependencies (trying to write tests before the test framework is set up)
- Vague tasks ("Improve error handling" — which errors? where?)
### 2. Execution (Build Iterations)
Each iteration, the agent:
1. Reads the plan
2. Finds the first `[ ]` task
3. Implements it
4. Marks it `[x]`
5. Commits the updated plan
6. Exits
**The commit includes the plan update:**
```bash
git add IMPLEMENTATION_PLAN.md src/parser.ts tests/parser.test.ts
git commit -m "feat: implement QFX file parser with field extraction"
```
### 3. Human Review (Periodic Check-ins)
You should review the plan at these moments:
| When | What to check |
|------|---------------|
| After planning phase | Does the decomposition make sense? Are tasks the right size? |
| After every 3-5 tasks | Is progress on track? Any tasks taking multiple iterations? |
| When agent signals STUCK | What's blocking it? Can you clarify the spec? |
| At phase boundaries | Is the phase complete? Ready for the next phase? |
---
## Intervening in the Plan
Sometimes you need to change the plan mid-flight. Here's how:
### Adding Tasks
New requirement discovered? Add it to the plan:
```markdown
- [ ] 🆕 Handle expired token during API calls (auto-refresh)
```
Use the 🆕 emoji to mark human-added tasks so you can track what changed.
### Removing Tasks
Scope cut? Mark it explicitly:
```markdown
- [~] ~~Monte Carlo visualization~~ (deferred to Phase 4)
```
Don't just delete — the agent might re-derive it from the spec.
### Reprioritizing
Move tasks up or down. Add a note so the agent understands:
```markdown
## Phase 1: Foundation
- [x] Project scaffolding
- [ ] REST API framework ← Moved up: needed for integration testing
- [ ] SQLite schema
```
### Splitting Tasks
If a task keeps failing (agent can't complete it in one iteration), split it:
```markdown
# Before
- [ ] Transaction import with deduplication
# After
- [ ] QFX file parser (parse file → array of transaction objects)
- [ ] CSV file parser (configurable column mapping)
- [ ] Transaction dedup engine (match on date + amount + payee hash)
- [ ] Import orchestrator (parse → dedup → insert)
```
### Adding Notes
Leave notes for the agent in the plan:
```markdown
- [ ] CLM API integration
> NOTE: The DTC CLM Demo account returns 401. Try CSA CLM Demo or
> DocuSign CLM Demo accounts instead. See ACCOUNTS.md for IDs.
```
---
## Plan Patterns
### The Scaffold-First Pattern
```markdown
- [ ] Project scaffolding (dirs, configs, package.json)
- [ ] Build pipeline (TypeScript compilation, scripts)
- [ ] Test infrastructure (framework, first passing test)
- [ ] CI configuration (lint + build + test)
```
Always start with infrastructure. An agent that can't build and test will produce garbage.
### The Vertical Slice Pattern
```markdown
- [ ] Single endpoint: GET /api/health (route + handler + test)
- [ ] Single entity CRUD: categories (model + routes + tests)
- [ ] Second entity: transactions (model + routes + tests + import)
```
Build one complete vertical slice first. It establishes patterns for everything else.
### The Test-First Pattern
```markdown
- [ ] Write failing tests for QFX parser
- [ ] Implement QFX parser to pass tests
- [ ] Write failing tests for CSV parser
- [ ] Implement CSV parser to pass tests
```
Pairs well with agents — the failing test IS the acceptance criterion.
### The Dependency Chain Pattern
```markdown
## Layer 0: Shared
- [ ] Type definitions and interfaces
## Layer 1: Data
- [ ] Database schema and migrations
- [ ] Data access layer
## Layer 2: Business Logic
- [ ] Calculation engines (depends on Layer 1)
- [ ] Rule processors (depends on Layer 1)
## Layer 3: Interface
- [ ] CLI commands (depends on Layer 2)
- [ ] API endpoints (depends on Layer 2)
```
Makes dependencies explicit. Agent can't skip ahead.
---
## Tracking Progress
### Git Log as Progress Report
The git log tells the story:
```bash
git log --oneline
# a1b2c3d feat: implement Monte Carlo simulation engine
# d4e5f6g feat: add retirement projection calculator
# h7i8j9k feat: spending aggregation API endpoints
# l0m1n2o feat: transaction import with deduplication
# p3q4r5s chore: project scaffolding and build pipeline
```
Each commit = one completed task. If you see multiple commits for one task, the task was too big.
### Velocity Tracking
Count completed tasks per iteration:
- **1 task/iteration** = ideal (the system is working)
- **0 tasks/iteration** = agent is stuck (intervene)
- **2+ tasks/iteration** = tasks are too small (merge them)
### Stuck Detection
Signs the agent is stuck:
- Same task attempted 3+ iterations in a row
- Agent outputs `<promise>STUCK</promise>`
- Build or tests failing repeatedly
- Agent is modifying unrelated files
**When stuck, ask:**
1. Is the task too large? → Split it
2. Is the spec unclear? → Clarify acceptance criteria
3. Is there a technical blocker? → Investigate and add notes
4. Is the agent fighting its own previous work? → Reset to last good commit
---
## Plan Anti-Patterns
### 1. The God Task
```markdown
# Bad
- [ ] Build the backend
# Good
- [ ] Database schema for users and transactions
- [ ] CRUD API for users
- [ ] CRUD API for transactions
- [ ] Transaction query with date/category filters
```
### 2. The Missing Dependency
```markdown
# Bad (schema doesn't exist yet!)
- [ ] REST API for transactions
- [ ] Database schema
# Good
- [ ] Database schema
- [ ] REST API for transactions
```
### 3. The Implicit Task
```markdown
# Bad (who sets up the test framework?)
- [ ] Write tests for parser
# Good
- [ ] Set up test framework (vitest config, first passing test)
- [ ] Write tests for parser
```
### 4. The Never-Done Task
```markdown
# Bad (when is "improve" done?)
- [ ] Improve error handling
# Good
- [ ] Add user-friendly error messages for auth failures (expired token, missing key, network error)
- [ ] Add try/catch with contextual messages to all API endpoints
```
---
_The plan is your control surface. Master it, and you master the agent loop._