269 lines
10 KiB
Markdown
269 lines
10 KiB
Markdown
# Wave-Based Project Management
|
||
|
||
> The biggest gap in most agentic projects: **planning only one task at a time.**
|
||
> This guide captures the wave-based approach — planning a full stream's worth of work
|
||
> before writing a single line of implementation code.
|
||
>
|
||
> Proven in practice: 44 tasks across 4 waves, 1,254 → 1,597 tests, zero regressions.
|
||
|
||
---
|
||
|
||
## The Core Insight: Plan the Stream, Not the Task
|
||
|
||
The basic harness has you plan one task at a time. This works for small projects.
|
||
For larger projects, it creates problems:
|
||
|
||
- **Scope drift:** Agent picks up the next task without understanding how it fits the stream
|
||
- **Missing dependencies:** Packet 3 turns out to need something Packet 1 should have built
|
||
- **Unknown-answer tests discovered too late:** Financial formulas validated by feel, not by known CRA/ESDC figures
|
||
- **No clear "done":** What does stream completion actually mean?
|
||
|
||
The solution: **write the entire execution board for a stream before implementing any of it.**
|
||
|
||
```
|
||
❌ Old approach:
|
||
Plan task → Implement task → Plan next task → Implement → ...
|
||
|
||
✅ Wave approach:
|
||
Plan ENTIRE stream → Review plan → Implement packet-by-packet → Close stream
|
||
```
|
||
|
||
---
|
||
|
||
## The Four Levels of Structure
|
||
|
||
```
|
||
Project
|
||
└── Waves (groups of streams, sequenced by dependency)
|
||
└── Streams (a feature or module — has its own branch)
|
||
└── Packets (atomic unit of work — one commit per packet)
|
||
└── Tasks (sub-steps within a packet)
|
||
```
|
||
|
||
### Waves
|
||
A wave is a set of streams that logically belong together and can be started in parallel (or have light dependencies between them). Waves are gated — Wave N+1 doesn't start until Wave N is fully merged and green.
|
||
|
||
**Example:**
|
||
- Wave 1: Core data models + calculation engines (everything else depends on this)
|
||
- Wave 2: Advisory layer + specialized tools (uses Wave 1 outputs)
|
||
- Wave 3: Infrastructure + integrations (can be parallel with Wave 2)
|
||
- Wave 4: Future vision / stretch goals
|
||
|
||
### Streams
|
||
A stream is a feature branch with a defined scope. It has:
|
||
- One `execution-board.md` (written before any code)
|
||
- 2–6 packets
|
||
- One `process-eval.md` (written after merge)
|
||
- Validation evidence per packet
|
||
|
||
### Packets
|
||
A packet is the atomic unit — one focused chunk of work that produces a commit. It has:
|
||
- A clear goal (one sentence)
|
||
- Explicit steps
|
||
- Known-answer tests (mandatory for calculation work)
|
||
- Programmatically verifiable acceptance criteria
|
||
- One validation evidence file
|
||
|
||
---
|
||
|
||
## The Execution Board: Your Planning Artifact
|
||
|
||
The execution board lives at `.harness/<stream>/execution-board.md`.
|
||
Copy `EXECUTION-BOARD-TEMPLATE.md` and fill it in.
|
||
|
||
**The rule:** The board must be complete before you write a single line of implementation.
|
||
|
||
### What "complete" means:
|
||
- Every packet is defined with goal, steps, files, and acceptance criteria
|
||
- Known-answer tests are written out (not "TBD") for any calculation
|
||
- Dependency order between packets is explicit
|
||
- Stream completion criteria are listed
|
||
|
||
### What happens if you skip it:
|
||
- You discover mid-stream that Packet 3 needs something Packet 1 didn't build
|
||
- You commit calculation code with no ground-truth validation
|
||
- You have no clear definition of "done" for the stream
|
||
- The next agent session doesn't know what state the stream is in
|
||
|
||
---
|
||
|
||
## Known-Answer Tests: The Most Important Rule
|
||
|
||
For any stream that touches domain-specific calculations (financial math, scientific formulas, regulatory thresholds, physical constants), every calculation module **must** include at least one known-answer test citing an official source.
|
||
|
||
```typescript
|
||
// ✅ Correct: cites official source, tests exact value
|
||
test('CPP at 70 is exactly 42% more than at 65', () => {
|
||
// Source: ESDC https://www.canada.ca/en/services/benefits/publicpensions/cpp/benefit-amount.html
|
||
// Formula: +0.7% per month after 65 × 60 months = +42%
|
||
expect(calculateCPPBenefitAtAge(1000, 70) / calculateCPPBenefitAtAge(1000, 65)).toBeCloseTo(1.42, 5);
|
||
});
|
||
|
||
// ❌ Wrong: no source, tests implementation against itself
|
||
test('CPP at 70 returns more than at 65', () => {
|
||
expect(calculateCPPBenefitAtAge(1000, 70)).toBeGreaterThan(calculateCPPBenefitAtAge(1000, 65));
|
||
});
|
||
```
|
||
|
||
**Why this matters:** An agent can write a plausible-looking formula that's subtly wrong. Without a known-answer test from an authoritative source, you won't catch it until someone gets incorrect results in production. With known-answer tests, errors are caught immediately.
|
||
|
||
### What qualifies as a "known-answer source":
|
||
- Government publications (CRA, ESDC, IRS, HMRC, etc.)
|
||
- Official standards documents (ISO, RFC, IEEE)
|
||
- Published academic results
|
||
- Regulatory filings with specific numerical requirements
|
||
- Product specifications with exact values
|
||
|
||
### The financial accuracy eval pattern
|
||
For financial software, create a separate calibration test suite that lives outside the normal unit tests:
|
||
|
||
```
|
||
evals/
|
||
└── code-quality/
|
||
└── financial-accuracy.test.ts ← Run with: npm run eval:financial-accuracy
|
||
```
|
||
|
||
This suite contains ONLY known-answer tests from official sources. It grows over time as you add calculation modules. Run it independently to verify the app's financial accuracy hasn't drifted.
|
||
|
||
---
|
||
|
||
## EXECUTION_MASTER.md: The Project Dashboard
|
||
|
||
Every project using wave-based management should have a single coordination file — typically `EXECUTION_MASTER.md` or equivalent — that shows:
|
||
|
||
```markdown
|
||
# Project Execution Master
|
||
|
||
## Wave Status
|
||
| Wave | Description | Status |
|
||
|------|-------------|--------|
|
||
| Wave 1 | Core foundations | ✅ Complete |
|
||
| Wave 2 | Advisory layer | 🟡 In progress |
|
||
| Wave 3 | Infrastructure | ⏸️ Not started |
|
||
|
||
## Active Streams
|
||
| Stream | Branch | Status | Blocker |
|
||
|--------|--------|--------|---------|
|
||
| cpp-optimizer | feat/cpp-optimizer | ✅ Merged | — |
|
||
| rrsp-meltdown | feat/rrsp-meltdown | 🟠 In progress | — |
|
||
| estate-planning | feat/estate-planning | ⏸️ Planned | Needs rrsp-meltdown |
|
||
|
||
## Parallelism Rules
|
||
1. Max 2 active streams simultaneously
|
||
2. Shared schema changes are always sequential
|
||
3. Integration gate before any merge: full test suite must stay green
|
||
```
|
||
|
||
**Every agent session starts by reading this file.** It immediately knows:
|
||
- What wave is active
|
||
- Which streams are running
|
||
- What's blocked and why
|
||
- What can run in parallel
|
||
|
||
---
|
||
|
||
## The Wave Gate
|
||
|
||
Before starting Wave N+1, verify:
|
||
|
||
```
|
||
[ ] All streams in Wave N merged to main
|
||
[ ] Full test suite green (count ≥ baseline)
|
||
[ ] Domain-specific accuracy suite passing (if applicable)
|
||
[ ] All regression baselines saved
|
||
[ ] Process evals written for all Wave N streams
|
||
[ ] process-eval-history.json updated
|
||
[ ] IMPLEMENTATION_PLAN: all Wave N tasks marked [x]
|
||
[ ] EXECUTION_MASTER: Wave N status updated to ✅
|
||
[ ] Human sign-off: outputs are producing correct/plausible results
|
||
```
|
||
|
||
The gate exists because Wave N+1 often builds on Wave N's outputs. If Wave N has silent bugs, they compound in Wave N+1. Catch them at the gate.
|
||
|
||
---
|
||
|
||
## File Organization
|
||
|
||
```
|
||
<project-root>/
|
||
├── AGENT.md ← Agent instructions (adapted from AGENT-INSTRUCTIONS.md)
|
||
├── IMPLEMENTATION_PLAN.md ← Master backlog (tasks 1-N, all waves)
|
||
├── PROJECT-SPEC.md ← What to build (never changes)
|
||
├── DECISIONS.md ← Architecture Decision Records
|
||
└── .harness/
|
||
├── EXECUTION_MASTER.md ← Wave/stream dashboard
|
||
├── EXECUTION-BOARD-TEMPLATE.md ← Copy this for new streams
|
||
├── VALIDATION-TEMPLATE.md ← Copy this for packet evidence
|
||
├── PROCESS-EVAL-TEMPLATE.md ← Copy this for stream retrospectives
|
||
├── regression-baselines/ ← Deterministic output snapshots
|
||
├── <stream-A>/
|
||
│ ├── execution-board.md ← Written BEFORE implementation
|
||
│ ├── process-eval.md ← Written AFTER merge
|
||
│ └── validation/
|
||
│ ├── <XX-01>-validation.md
|
||
│ └── <XX-02>-validation.md
|
||
└── <stream-B>/
|
||
└── ...
|
||
```
|
||
|
||
---
|
||
|
||
## Adapting for Your Project
|
||
|
||
### Projects WITHOUT domain-specific calculations
|
||
Skip the known-answer tests and financial accuracy eval. Keep everything else.
|
||
|
||
### Projects with a small scope (< 10 tasks)
|
||
Skip waves entirely — just use streams. One execution board per logical feature group.
|
||
|
||
### Projects with a single developer (no parallelism)
|
||
Streams are still valuable for planning discipline even if run sequentially.
|
||
|
||
### Non-TypeScript / non-test projects
|
||
Adapt the commit trailers. The key trackers are:
|
||
- **What model did the work** (for attribution and quality tracking)
|
||
- **Test counts** or equivalent quality metric
|
||
- **Build / type check status**
|
||
|
||
---
|
||
|
||
## Quick Reference: The Discipline in One Page
|
||
|
||
```
|
||
BEFORE CODING:
|
||
✅ Write execution board for the entire stream
|
||
✅ Define known-answer tests for ALL calculation modules
|
||
✅ Get acceptance criteria to programmatically verifiable
|
||
|
||
PER PACKET:
|
||
✅ Code + tests in same commit
|
||
✅ Full suite green before moving on
|
||
✅ Write validation evidence immediately after
|
||
✅ Commit trailer: Agent / Tests / Tests-Added / TypeScript
|
||
|
||
PER STREAM:
|
||
✅ Write process eval honestly
|
||
✅ Merge with --no-ff
|
||
✅ Update EXECUTION_MASTER
|
||
|
||
PER WAVE:
|
||
✅ Run wave gate checklist before starting next wave
|
||
✅ Human sign-off on outputs
|
||
```
|
||
|
||
---
|
||
|
||
## Why This Works
|
||
|
||
The wave-based approach solves three failure modes common in agent projects:
|
||
|
||
**1. Scope drift** — The execution board defines the stream's boundaries upfront. Agents can't drift into unrelated work because the plan is explicit.
|
||
|
||
**2. Hidden inaccuracies** — Known-answer tests with official citations are written in the planning phase, before any implementation. This forces precision in the spec, which translates directly into correct implementations.
|
||
|
||
**3. No definition of done** — The stream completion criteria (in the execution board) tell every agent, every session: "the stream is done when these boxes are checked." No ambiguity.
|
||
|
||
---
|
||
|
||
*This pattern was developed through practice on the Fintrove project (2026-03-31 → 2026-04-01): 4 waves, 11 streams, 44 tasks, 1,254 → 1,597 tests, zero regressions.*
|