agent-harness/EXAMPLES.md

# Agent Harness — Worked Examples

## How Much Context Does an Agent Need?

The key insight from both the Nate Jones approach and the Ralph Wiggum loop:

> **The agent needs enough context to work autonomously for ONE task,
> but the system needs enough structure to coordinate across MANY tasks.**

This means two layers of documentation:

### Layer 1: The Spec (written by you, read-only for agents)
- What you're building and why
- Technical constraints and decisions
- Acceptance criteria for every feature
- Data models and API shapes

### Layer 2: The Plan (created by agent, updated each iteration)
- Task decomposition with checkboxes
- Dependencies between tasks
- Current status

---

## The Three Approaches Compared

### Ezward's Approach (vibe-basic)
**Style:** Single sequential PRD — numbered steps, each building on the last.

**Strengths:**
- Very explicit about what to build in what order
- Each step includes "add unit tests" and "make sure it compiles"
- The "Generally" section at the end sets cross-cutting standards
- Language spec provided as a separate reference file

**Best for:** Well-understood problems where you know the implementation order.

**Key pattern:** The PRD *is* the implementation plan. Steps 1-14, do them in order.

### Ralph Wiggum Loop
**Style:** Spec + Plan separation. Agent creates its own plan from the spec.

**Strengths:**
- Fresh context each iteration (no context window overflow)
- Agent decomposes tasks itself (may find better ordering)
- Git history is the "memory" between iterations
- Simple bash loop — no complex orchestration

**Best for:** Larger projects where you want the agent to figure out task ordering.

**Key pattern:** `while :; do cat PROMPT.md | claude -p; done`

### Nate Jones / Task Decomposition
**Style:** Decompose → parallelize → verify → iterate.

**Strengths:**
- Multiple agents can work on different tasks simultaneously
- Verification step catches integration issues
- Iteration handles failures gracefully

**Best for:** Large projects with independent components that can be parallelized.

**Key pattern:** Orchestrator agent spawns worker agents for each task.

---

## Example: Personal Finance App (Fintrove-style)

Here's what a complete spec would look like for a Fintrove-like personal finance application. This is the document you'd give to a team of agents.

### PROJECT-SPEC.md

```markdown
# Project Specification: FinPlan — Personal Finance Dashboard

## 1. Project Overview

### What are we building?
A privacy-first personal finance dashboard that helps a retiree manage
their money. It imports transaction data, categorizes spending, projects
retirement income against expenses, and runs Monte Carlo simulations to
stress-test withdrawal strategies.

### Why does it matter?
Existing tools (Mint, YNAB) are cloud-based and sell your data. Quicken
is stagnating. We want a local-first tool that's actually useful for
retirement planning with Canadian tax rules (RRSP meltdown, CPP/OAS
optimization, pension integration).

### Success criteria
- [ ] Import Quicken QFX/CSV exports and categorize transactions
- [ ] Dashboard shows monthly spending by category (current month + trends)
- [ ] Retirement projection shows income vs expenses for 30 years
- [ ] Monte Carlo simulation with 1000+ runs using historical market data
- [ ] All data stays local (SQLite, no cloud)
- [ ] Runs in browser via local server

## 2. Technical Foundation

### Tech stack
- **Language:** TypeScript (Node.js backend, browser frontend)
- **Framework:** Express.js (API), vanilla HTML/CSS/JS (frontend)
- **Database:** SQLite via better-sqlite3
- **Build system:** esbuild for frontend bundling
- **Test framework:** Node.js built-in test runner
- **Package manager:** npm

### Project structure
project/
├── packages/
│   ├── server/          # Express API + SQLite
│   ├── client/          # Browser frontend
│   └── shared/          # Types, constants, utils
├── data/                # Sample data for testing
├── docs/                # Design docs
├── PROJECT-SPEC.md
├── IMPLEMENTATION_PLAN.md
└── AGENT.md

### Build & test commands
npm install
npm run build
npm test
npm run lint

### Coding standards
- TypeScript strict mode
- No `any` types except in test fixtures
- All public functions documented with JSDoc
- Error messages must be user-friendly (no stack traces in UI)
- SQL queries use parameterized statements (no string concatenation)

## 3. Requirements

### FR-001: Transaction Import
**Description:** Import financial transactions from QFX (OFX) and CSV files.
**Acceptance criteria:**
- [ ] Parse QFX files and extract: date, amount, payee, memo, type
- [ ] Parse CSV files with configurable column mapping
- [ ] Deduplicate transactions by date + amount + payee
- [ ] Store in SQLite with account association
- [ ] CLI command: `npm run import -- --file data/transactions.qfx`

### FR-002: Auto-Categorization
**Description:** Automatically categorize transactions based on payee patterns.
**Acceptance criteria:**
- [ ] Rule-based matching: payee contains "COSTCO" → Groceries
- [ ] Rules stored in SQLite, editable via API
- [ ] Uncategorized transactions flagged for manual review
- [ ] Bulk categorization: apply rule retroactively to past transactions
- [ ] At least 20 default rules for common Canadian merchants

### FR-003: Spending Dashboard
**Description:** Web dashboard showing spending breakdown and trends.
**Acceptance criteria:**
- [ ] Monthly spending by category (bar chart)
- [ ] 12-month trend line per category
- [ ] Total income vs total expenses per month
- [ ] Filter by date range and account
- [ ] Loads in < 500ms for 10,000 transactions

### FR-004: Retirement Projection
**Description:** Project income and expenses over a 30-year retirement.
**Acceptance criteria:**
- [ ] Input: current age, retirement age, life expectancy
- [ ] Income sources: pension (fixed), CPP (age-dependent), OAS (age-dependent)
- [ ] RRSP meltdown strategy: withdraw X/year for Y years before age 65
- [ ] Inflation adjustment (configurable rate, default 2.5%)
- [ ] Output: year-by-year table of income, expenses, portfolio balance

### FR-005: Monte Carlo Simulation
**Description:** Stress-test retirement plan against historical market returns.
**Acceptance criteria:**
- [ ] Use S&P 500 historical annual returns (1928-present)
- [ ] Run 1,000+ simulations with random return sequences
- [ ] Output: success rate (% of runs where money lasts)
- [ ] Visualization: fan chart showing percentile bands
- [ ] Compare strategies: 4% rule vs dynamic withdrawal

### NFR-001: Privacy
- [ ] All data stored locally in SQLite
- [ ] No network requests except to localhost
- [ ] No analytics, telemetry, or tracking

### NFR-002: Performance
- [ ] Dashboard loads in < 1 second
- [ ] Monte Carlo (1000 runs) completes in < 5 seconds
- [ ] Import 10,000 transactions in < 10 seconds

### NFR-003: Testing
- [ ] 80%+ code coverage
- [ ] Integration tests for API endpoints
- [ ] Unit tests for calculation functions
- [ ] Sample data fixtures for reproducible tests

## 4. Data Model

### Entities

Entity: Account
  - id: INTEGER (primary key, auto-increment)
  - name: TEXT (required, e.g. "RRSP", "TFSA", "Chequing")
  - type: TEXT (checking | savings | investment | credit)
  - institution: TEXT (optional)

Entity: Transaction
  - id: INTEGER (primary key, auto-increment)
  - account_id: INTEGER (foreign key → Account)
  - date: TEXT (ISO 8601 date)
  - amount: REAL (positive = income, negative = expense)
  - payee: TEXT
  - memo: TEXT (optional)
  - category_id: INTEGER (foreign key → Category, nullable)
  - import_hash: TEXT (unique, for deduplication)

Entity: Category
  - id: INTEGER (primary key, auto-increment)
  - name: TEXT (unique, e.g. "Groceries", "Utilities")
  - type: TEXT (expense | income | transfer)
  - budget: REAL (optional monthly budget)

Entity: CategoryRule
  - id: INTEGER (primary key, auto-increment)
  - pattern: TEXT (substring match on payee)
  - category_id: INTEGER (foreign key → Category)
  - priority: INTEGER (higher = matched first)

Entity: RetirementProfile
  - id: INTEGER (primary key, auto-increment)
  - name: TEXT
  - current_age: INTEGER
  - retirement_age: INTEGER
  - life_expectancy: INTEGER
  - annual_expenses: REAL
  - cpp_start_age: INTEGER (default 70)
  - oas_start_age: INTEGER (default 70)
  - pension_annual: REAL
  - rrsp_balance: REAL
  - tfsa_balance: REAL
  - non_reg_balance: REAL

## 5. API Design

### REST Endpoints

GET    /api/accounts
POST   /api/accounts
GET    /api/transactions?from=&to=&account=&category=
POST   /api/import  (multipart file upload)
GET    /api/categories
POST   /api/categories
GET    /api/categories/rules
POST   /api/categories/rules
GET    /api/spending/monthly?from=&to=
GET    /api/spending/trends?months=12
GET    /api/retirement/projection/:profileId
POST   /api/retirement/monte-carlo/:profileId

## 6. Architecture Decisions

### Constraints
- MUST: Use SQLite (no PostgreSQL, no cloud DB)
- MUST: Run entirely on localhost
- MUST: Work offline
- MUST NOT: Make any external network requests
- MUST NOT: Use React/Vue/Angular (vanilla JS + HTML templates)
- PREFER: Native ES modules over bundling where possible

### Known Challenges
- QFX/OFX parsing is XML-based with quirky formatting
- Canadian CPP/OAS calculations have complex age-dependent rules
- Monte Carlo needs to be fast — consider Web Workers for UI

## 7. Phasing

### Phase 1: Data Foundation (Tasks 1-5)
- [ ] Project scaffolding (monorepo, build, test)
- [ ] SQLite schema + migrations
- [ ] QFX/CSV import
- [ ] Category rules engine
- [ ] REST API for CRUD

### Phase 2: Dashboard (Tasks 6-8)
- [ ] Spending by category (API + chart)
- [ ] Trend lines
- [ ] Date/account filters

### Phase 3: Retirement Engine (Tasks 9-12)
- [ ] Income projection calculator
- [ ] RRSP meltdown logic
- [ ] CPP/OAS optimization
- [ ] Monte Carlo simulation

### Phase 4: Polish (Tasks 13-15)
- [ ] Error handling + user messages
- [ ] Performance optimization
- [ ] Documentation

## 8. Reference Materials

### External docs
- QFX/OFX spec: https://www.ofx.net/
- CPP benefits: https://www.canada.ca/en/services/benefits/publicpensions/cpp.html
- OAS benefits: https://www.canada.ca/en/services/benefits/publicpensions/old-age-security.html
- S&P 500 historical returns: included in data/sp500-returns.csv

### Anti-patterns
- Don't use localStorage for data — SQLite is the source of truth
- Don't try to parse bank-specific CSV formats — use configurable column mapping
- Don't calculate CPP/OAS inline — extract to a dedicated module with unit tests
```

---

## What Makes a Good Spec?

Looking at what works across Ezward's PRD, Ralph Wiggum, and Nate Jones:

### 1. Be Specific About Acceptance Criteria
Bad: "Import transactions"
Good: "Parse QFX files and extract: date, amount, payee, memo, type. Deduplicate by date + amount + payee. Store in SQLite."

### 2. Define the Tech Stack — Don't Let the Agent Choose
Bad: "Use a modern framework"
Good: "TypeScript, Express.js, SQLite via better-sqlite3, vanilla HTML/CSS/JS frontend"

### 3. Include Data Models
Agents that know the data model write better code. Define entities, relationships, and constraints explicitly.

### 4. Provide Build/Test Commands
The agent needs to verify its own work. If it can't run `npm test`, it can't iterate.

### 5. List Anti-Patterns
Tell the agent what NOT to do. This prevents it from going down rabbit holes you've already explored.

### 6. Phase the Work
Large projects need phases. Each phase should be independently deployable. The agent can complete Phase 1 before touching Phase 2.

### 7. Include Sample Data
Agents test better when they have example inputs and expected outputs.

---

## Running with OpenClaw

You can use OpenClaw's `sessions_spawn` to run the Ralph Wiggum pattern:

```bash
# Planning phase
sessions_spawn --task "Read PROJECT-SPEC.md in /path/to/project.
  Decompose into tasks. Write IMPLEMENTATION_PLAN.md." \
  --model opus

# Build iterations (spawn one at a time, or use cron)
sessions_spawn --task "Read AGENT.md in /path/to/project.
  Follow the core loop. Pick ONE task, implement, test, commit." \
  --model sonnet
```

Or use the bash loop directly with Claude Code:
```bash
cd /path/to/project
./ralph-loop.sh --agent claude --max 30
```