agent-harness/WAVE-BASED-MANAGEMENT.md

# Wave-Based Project Management

> The biggest gap in most agentic projects: **planning only one task at a time.**
> This guide captures the wave-based approach — planning a full stream's worth of work
> before writing a single line of implementation code.
>
> Proven in practice: 44 tasks across 4 waves, 1,254 → 1,597 tests, zero regressions.

---

## The Core Insight: Plan the Stream, Not the Task

The basic harness has you plan one task at a time. This works for small projects.
For larger projects, it creates problems:

- **Scope drift:** Agent picks up the next task without understanding how it fits the stream
- **Missing dependencies:** Packet 3 turns out to need something Packet 1 should have built
- **Unknown-answer tests discovered too late:** Financial formulas validated by feel, not by known CRA/ESDC figures
- **No clear "done":** What does stream completion actually mean?

The solution: **write the entire execution board for a stream before implementing any of it.**

```
❌ Old approach:
  Plan task → Implement task → Plan next task → Implement → ...

✅ Wave approach:
  Plan ENTIRE stream → Review plan → Implement packet-by-packet → Close stream
```

---

## The Four Levels of Structure

```
Project
└── Waves (groups of streams, sequenced by dependency)
    └── Streams (a feature or module — has its own branch)
        └── Packets (atomic unit of work — one commit per packet)
            └── Tasks (sub-steps within a packet)
```

### Waves
A wave is a set of streams that logically belong together and can be started in parallel (or have light dependencies between them). Waves are gated — Wave N+1 doesn't start until Wave N is fully merged and green.

**Example:**
- Wave 1: Core data models + calculation engines (everything else depends on this)
- Wave 2: Advisory layer + specialized tools (uses Wave 1 outputs)
- Wave 3: Infrastructure + integrations (can be parallel with Wave 2)
- Wave 4: Future vision / stretch goals

### Streams
A stream is a feature branch with a defined scope. It has:
- One `execution-board.md` (written before any code)
- 2–6 packets
- One `process-eval.md` (written after merge)
- Validation evidence per packet

### Packets
A packet is the atomic unit — one focused chunk of work that produces a commit. It has:
- A clear goal (one sentence)
- Explicit steps
- Known-answer tests (mandatory for calculation work)
- Programmatically verifiable acceptance criteria
- One validation evidence file

---

## The Execution Board: Your Planning Artifact

The execution board lives at `.harness/<stream>/execution-board.md`.
Copy `EXECUTION-BOARD-TEMPLATE.md` and fill it in.

**The rule:** The board must be complete before you write a single line of implementation.

### What "complete" means:
- Every packet is defined with goal, steps, files, and acceptance criteria
- Known-answer tests are written out (not "TBD") for any calculation
- Dependency order between packets is explicit
- Stream completion criteria are listed

### What happens if you skip it:
- You discover mid-stream that Packet 3 needs something Packet 1 didn't build
- You commit calculation code with no ground-truth validation
- You have no clear definition of "done" for the stream
- The next agent session doesn't know what state the stream is in

---

## Known-Answer Tests: The Most Important Rule

For any stream that touches domain-specific calculations (financial math, scientific formulas, regulatory thresholds, physical constants), every calculation module **must** include at least one known-answer test citing an official source.

```typescript
// ✅ Correct: cites official source, tests exact value
test('CPP at 70 is exactly 42% more than at 65', () => {
  // Source: ESDC https://www.canada.ca/en/services/benefits/publicpensions/cpp/benefit-amount.html
  // Formula: +0.7% per month after 65 × 60 months = +42%
  expect(calculateCPPBenefitAtAge(1000, 70) / calculateCPPBenefitAtAge(1000, 65)).toBeCloseTo(1.42, 5);
});

// ❌ Wrong: no source, tests implementation against itself
test('CPP at 70 returns more than at 65', () => {
  expect(calculateCPPBenefitAtAge(1000, 70)).toBeGreaterThan(calculateCPPBenefitAtAge(1000, 65));
});
```

**Why this matters:** An agent can write a plausible-looking formula that's subtly wrong. Without a known-answer test from an authoritative source, you won't catch it until someone gets incorrect results in production. With known-answer tests, errors are caught immediately.

### What qualifies as a "known-answer source":
- Government publications (CRA, ESDC, IRS, HMRC, etc.)
- Official standards documents (ISO, RFC, IEEE)
- Published academic results
- Regulatory filings with specific numerical requirements
- Product specifications with exact values

### The financial accuracy eval pattern
For financial software, create a separate calibration test suite that lives outside the normal unit tests:

```
evals/
└── code-quality/
    └── financial-accuracy.test.ts   ← Run with: npm run eval:financial-accuracy
```

This suite contains ONLY known-answer tests from official sources. It grows over time as you add calculation modules. Run it independently to verify the app's financial accuracy hasn't drifted.

---

## EXECUTION_MASTER.md: The Project Dashboard

Every project using wave-based management should have a single coordination file — typically `EXECUTION_MASTER.md` or equivalent — that shows:

```markdown
# Project Execution Master

## Wave Status
| Wave | Description | Status |
|------|-------------|--------|
| Wave 1 | Core foundations | ✅ Complete |
| Wave 2 | Advisory layer | 🟡 In progress |
| Wave 3 | Infrastructure | ⏸️ Not started |

## Active Streams
| Stream | Branch | Status | Blocker |
|--------|--------|--------|---------|
| cpp-optimizer | feat/cpp-optimizer | ✅ Merged | — |
| rrsp-meltdown | feat/rrsp-meltdown | 🟠 In progress | — |
| estate-planning | feat/estate-planning | ⏸️ Planned | Needs rrsp-meltdown |

## Parallelism Rules
1. Max 2 active streams simultaneously
2. Shared schema changes are always sequential
3. Integration gate before any merge: full test suite must stay green
```

**Every agent session starts by reading this file.** It immediately knows:
- What wave is active
- Which streams are running
- What's blocked and why
- What can run in parallel

---

## The Wave Gate

Before starting Wave N+1, verify:

```
[ ] All streams in Wave N merged to main
[ ] Full test suite green (count ≥ baseline)
[ ] Domain-specific accuracy suite passing (if applicable)
[ ] All regression baselines saved
[ ] Process evals written for all Wave N streams
[ ] process-eval-history.json updated
[ ] IMPLEMENTATION_PLAN: all Wave N tasks marked [x]
[ ] EXECUTION_MASTER: Wave N status updated to ✅
[ ] Human sign-off: outputs are producing correct/plausible results
```

The gate exists because Wave N+1 often builds on Wave N's outputs. If Wave N has silent bugs, they compound in Wave N+1. Catch them at the gate.

---

## File Organization

```
<project-root>/
├── AGENT.md                          ← Agent instructions (adapted from AGENT-INSTRUCTIONS.md)
├── IMPLEMENTATION_PLAN.md            ← Master backlog (tasks 1-N, all waves)
├── PROJECT-SPEC.md                   ← What to build (never changes)
├── DECISIONS.md                      ← Architecture Decision Records
└── .harness/
    ├── EXECUTION_MASTER.md           ← Wave/stream dashboard
    ├── EXECUTION-BOARD-TEMPLATE.md   ← Copy this for new streams
    ├── VALIDATION-TEMPLATE.md        ← Copy this for packet evidence
    ├── PROCESS-EVAL-TEMPLATE.md      ← Copy this for stream retrospectives
    ├── regression-baselines/         ← Deterministic output snapshots
    ├── <stream-A>/
    │   ├── execution-board.md        ← Written BEFORE implementation
    │   ├── process-eval.md           ← Written AFTER merge
    │   └── validation/
    │       ├── <XX-01>-validation.md
    │       └── <XX-02>-validation.md
    └── <stream-B>/
        └── ...
```

---

## Adapting for Your Project

### Projects WITHOUT domain-specific calculations
Skip the known-answer tests and financial accuracy eval. Keep everything else.

### Projects with a small scope (< 10 tasks)
Skip waves entirely — just use streams. One execution board per logical feature group.

### Projects with a single developer (no parallelism)
Streams are still valuable for planning discipline even if run sequentially.

### Non-TypeScript / non-test projects
Adapt the commit trailers. The key trackers are:
- **What model did the work** (for attribution and quality tracking)
- **Test counts** or equivalent quality metric
- **Build / type check status**

---

## Quick Reference: The Discipline in One Page

```
BEFORE CODING:
  ✅ Write execution board for the entire stream
  ✅ Define known-answer tests for ALL calculation modules
  ✅ Get acceptance criteria to programmatically verifiable

PER PACKET:
  ✅ Code + tests in same commit
  ✅ Full suite green before moving on
  ✅ Write validation evidence immediately after
  ✅ Commit trailer: Agent / Tests / Tests-Added / TypeScript

PER STREAM:
  ✅ Write process eval honestly
  ✅ Merge with --no-ff
  ✅ Update EXECUTION_MASTER

PER WAVE:
  ✅ Run wave gate checklist before starting next wave
  ✅ Human sign-off on outputs
```

---

## Why This Works

The wave-based approach solves three failure modes common in agent projects:

**1. Scope drift** — The execution board defines the stream's boundaries upfront. Agents can't drift into unrelated work because the plan is explicit.

**2. Hidden inaccuracies** — Known-answer tests with official citations are written in the planning phase, before any implementation. This forces precision in the spec, which translates directly into correct implementations.

**3. No definition of done** — The stream completion criteria (in the execution board) tell every agent, every session: "the stream is done when these boxes are checked." No ambiguity.

---

*This pattern was developed through practice on the Fintrove project (2026-03-31 → 2026-04-01): 4 waves, 11 streams, 44 tasks, 1,254 → 1,597 tests, zero regressions.*