agent-harness/EXAMPLES.md

365 lines
12 KiB
Markdown

# Agent Harness — Worked Examples
## How Much Context Does an Agent Need?
The key insight from both the Nate Jones approach and the Ralph Wiggum loop:
> **The agent needs enough context to work autonomously for ONE task,
> but the system needs enough structure to coordinate across MANY tasks.**
This means two layers of documentation:
### Layer 1: The Spec (written by you, read-only for agents)
- What you're building and why
- Technical constraints and decisions
- Acceptance criteria for every feature
- Data models and API shapes
### Layer 2: The Plan (created by agent, updated each iteration)
- Task decomposition with checkboxes
- Dependencies between tasks
- Current status
---
## The Three Approaches Compared
### Ezward's Approach (vibe-basic)
**Style:** Single sequential PRD — numbered steps, each building on the last.
**Strengths:**
- Very explicit about what to build in what order
- Each step includes "add unit tests" and "make sure it compiles"
- The "Generally" section at the end sets cross-cutting standards
- Language spec provided as a separate reference file
**Best for:** Well-understood problems where you know the implementation order.
**Key pattern:** The PRD *is* the implementation plan. Steps 1-14, do them in order.
### Ralph Wiggum Loop
**Style:** Spec + Plan separation. Agent creates its own plan from the spec.
**Strengths:**
- Fresh context each iteration (no context window overflow)
- Agent decomposes tasks itself (may find better ordering)
- Git history is the "memory" between iterations
- Simple bash loop — no complex orchestration
**Best for:** Larger projects where you want the agent to figure out task ordering.
**Key pattern:** `while :; do cat PROMPT.md | claude -p; done`
### Nate Jones / Task Decomposition
**Style:** Decompose → parallelize → verify → iterate.
**Strengths:**
- Multiple agents can work on different tasks simultaneously
- Verification step catches integration issues
- Iteration handles failures gracefully
**Best for:** Large projects with independent components that can be parallelized.
**Key pattern:** Orchestrator agent spawns worker agents for each task.
---
## Example: Personal Finance App (Fintrove-style)
Here's what a complete spec would look like for a Fintrove-like personal finance application. This is the document you'd give to a team of agents.
### PROJECT-SPEC.md
```markdown
# Project Specification: FinPlan — Personal Finance Dashboard
## 1. Project Overview
### What are we building?
A privacy-first personal finance dashboard that helps a retiree manage
their money. It imports transaction data, categorizes spending, projects
retirement income against expenses, and runs Monte Carlo simulations to
stress-test withdrawal strategies.
### Why does it matter?
Existing tools (Mint, YNAB) are cloud-based and sell your data. Quicken
is stagnating. We want a local-first tool that's actually useful for
retirement planning with Canadian tax rules (RRSP meltdown, CPP/OAS
optimization, pension integration).
### Success criteria
- [ ] Import Quicken QFX/CSV exports and categorize transactions
- [ ] Dashboard shows monthly spending by category (current month + trends)
- [ ] Retirement projection shows income vs expenses for 30 years
- [ ] Monte Carlo simulation with 1000+ runs using historical market data
- [ ] All data stays local (SQLite, no cloud)
- [ ] Runs in browser via local server
## 2. Technical Foundation
### Tech stack
- **Language:** TypeScript (Node.js backend, browser frontend)
- **Framework:** Express.js (API), vanilla HTML/CSS/JS (frontend)
- **Database:** SQLite via better-sqlite3
- **Build system:** esbuild for frontend bundling
- **Test framework:** Node.js built-in test runner
- **Package manager:** npm
### Project structure
project/
├── packages/
│ ├── server/ # Express API + SQLite
│ ├── client/ # Browser frontend
│ └── shared/ # Types, constants, utils
├── data/ # Sample data for testing
├── docs/ # Design docs
├── PROJECT-SPEC.md
├── IMPLEMENTATION_PLAN.md
└── AGENT.md
### Build & test commands
npm install
npm run build
npm test
npm run lint
### Coding standards
- TypeScript strict mode
- No `any` types except in test fixtures
- All public functions documented with JSDoc
- Error messages must be user-friendly (no stack traces in UI)
- SQL queries use parameterized statements (no string concatenation)
## 3. Requirements
### FR-001: Transaction Import
**Description:** Import financial transactions from QFX (OFX) and CSV files.
**Acceptance criteria:**
- [ ] Parse QFX files and extract: date, amount, payee, memo, type
- [ ] Parse CSV files with configurable column mapping
- [ ] Deduplicate transactions by date + amount + payee
- [ ] Store in SQLite with account association
- [ ] CLI command: `npm run import -- --file data/transactions.qfx`
### FR-002: Auto-Categorization
**Description:** Automatically categorize transactions based on payee patterns.
**Acceptance criteria:**
- [ ] Rule-based matching: payee contains "COSTCO" → Groceries
- [ ] Rules stored in SQLite, editable via API
- [ ] Uncategorized transactions flagged for manual review
- [ ] Bulk categorization: apply rule retroactively to past transactions
- [ ] At least 20 default rules for common Canadian merchants
### FR-003: Spending Dashboard
**Description:** Web dashboard showing spending breakdown and trends.
**Acceptance criteria:**
- [ ] Monthly spending by category (bar chart)
- [ ] 12-month trend line per category
- [ ] Total income vs total expenses per month
- [ ] Filter by date range and account
- [ ] Loads in < 500ms for 10,000 transactions
### FR-004: Retirement Projection
**Description:** Project income and expenses over a 30-year retirement.
**Acceptance criteria:**
- [ ] Input: current age, retirement age, life expectancy
- [ ] Income sources: pension (fixed), CPP (age-dependent), OAS (age-dependent)
- [ ] RRSP meltdown strategy: withdraw X/year for Y years before age 65
- [ ] Inflation adjustment (configurable rate, default 2.5%)
- [ ] Output: year-by-year table of income, expenses, portfolio balance
### FR-005: Monte Carlo Simulation
**Description:** Stress-test retirement plan against historical market returns.
**Acceptance criteria:**
- [ ] Use S&P 500 historical annual returns (1928-present)
- [ ] Run 1,000+ simulations with random return sequences
- [ ] Output: success rate (% of runs where money lasts)
- [ ] Visualization: fan chart showing percentile bands
- [ ] Compare strategies: 4% rule vs dynamic withdrawal
### NFR-001: Privacy
- [ ] All data stored locally in SQLite
- [ ] No network requests except to localhost
- [ ] No analytics, telemetry, or tracking
### NFR-002: Performance
- [ ] Dashboard loads in < 1 second
- [ ] Monte Carlo (1000 runs) completes in < 5 seconds
- [ ] Import 10,000 transactions in < 10 seconds
### NFR-003: Testing
- [ ] 80%+ code coverage
- [ ] Integration tests for API endpoints
- [ ] Unit tests for calculation functions
- [ ] Sample data fixtures for reproducible tests
## 4. Data Model
### Entities
Entity: Account
- id: INTEGER (primary key, auto-increment)
- name: TEXT (required, e.g. "RRSP", "TFSA", "Chequing")
- type: TEXT (checking | savings | investment | credit)
- institution: TEXT (optional)
Entity: Transaction
- id: INTEGER (primary key, auto-increment)
- account_id: INTEGER (foreign key Account)
- date: TEXT (ISO 8601 date)
- amount: REAL (positive = income, negative = expense)
- payee: TEXT
- memo: TEXT (optional)
- category_id: INTEGER (foreign key Category, nullable)
- import_hash: TEXT (unique, for deduplication)
Entity: Category
- id: INTEGER (primary key, auto-increment)
- name: TEXT (unique, e.g. "Groceries", "Utilities")
- type: TEXT (expense | income | transfer)
- budget: REAL (optional monthly budget)
Entity: CategoryRule
- id: INTEGER (primary key, auto-increment)
- pattern: TEXT (substring match on payee)
- category_id: INTEGER (foreign key Category)
- priority: INTEGER (higher = matched first)
Entity: RetirementProfile
- id: INTEGER (primary key, auto-increment)
- name: TEXT
- current_age: INTEGER
- retirement_age: INTEGER
- life_expectancy: INTEGER
- annual_expenses: REAL
- cpp_start_age: INTEGER (default 70)
- oas_start_age: INTEGER (default 70)
- pension_annual: REAL
- rrsp_balance: REAL
- tfsa_balance: REAL
- non_reg_balance: REAL
## 5. API Design
### REST Endpoints
GET /api/accounts
POST /api/accounts
GET /api/transactions?from=&to=&account=&category=
POST /api/import (multipart file upload)
GET /api/categories
POST /api/categories
GET /api/categories/rules
POST /api/categories/rules
GET /api/spending/monthly?from=&to=
GET /api/spending/trends?months=12
GET /api/retirement/projection/:profileId
POST /api/retirement/monte-carlo/:profileId
## 6. Architecture Decisions
### Constraints
- MUST: Use SQLite (no PostgreSQL, no cloud DB)
- MUST: Run entirely on localhost
- MUST: Work offline
- MUST NOT: Make any external network requests
- MUST NOT: Use React/Vue/Angular (vanilla JS + HTML templates)
- PREFER: Native ES modules over bundling where possible
### Known Challenges
- QFX/OFX parsing is XML-based with quirky formatting
- Canadian CPP/OAS calculations have complex age-dependent rules
- Monte Carlo needs to be fast consider Web Workers for UI
## 7. Phasing
### Phase 1: Data Foundation (Tasks 1-5)
- [ ] Project scaffolding (monorepo, build, test)
- [ ] SQLite schema + migrations
- [ ] QFX/CSV import
- [ ] Category rules engine
- [ ] REST API for CRUD
### Phase 2: Dashboard (Tasks 6-8)
- [ ] Spending by category (API + chart)
- [ ] Trend lines
- [ ] Date/account filters
### Phase 3: Retirement Engine (Tasks 9-12)
- [ ] Income projection calculator
- [ ] RRSP meltdown logic
- [ ] CPP/OAS optimization
- [ ] Monte Carlo simulation
### Phase 4: Polish (Tasks 13-15)
- [ ] Error handling + user messages
- [ ] Performance optimization
- [ ] Documentation
## 8. Reference Materials
### External docs
- QFX/OFX spec: https://www.ofx.net/
- CPP benefits: https://www.canada.ca/en/services/benefits/publicpensions/cpp.html
- OAS benefits: https://www.canada.ca/en/services/benefits/publicpensions/old-age-security.html
- S&P 500 historical returns: included in data/sp500-returns.csv
### Anti-patterns
- Don't use localStorage for data SQLite is the source of truth
- Don't try to parse bank-specific CSV formats use configurable column mapping
- Don't calculate CPP/OAS inline extract to a dedicated module with unit tests
```
---
## What Makes a Good Spec?
Looking at what works across Ezward's PRD, Ralph Wiggum, and Nate Jones:
### 1. Be Specific About Acceptance Criteria
Bad: "Import transactions"
Good: "Parse QFX files and extract: date, amount, payee, memo, type. Deduplicate by date + amount + payee. Store in SQLite."
### 2. Define the Tech Stack — Don't Let the Agent Choose
Bad: "Use a modern framework"
Good: "TypeScript, Express.js, SQLite via better-sqlite3, vanilla HTML/CSS/JS frontend"
### 3. Include Data Models
Agents that know the data model write better code. Define entities, relationships, and constraints explicitly.
### 4. Provide Build/Test Commands
The agent needs to verify its own work. If it can't run `npm test`, it can't iterate.
### 5. List Anti-Patterns
Tell the agent what NOT to do. This prevents it from going down rabbit holes you've already explored.
### 6. Phase the Work
Large projects need phases. Each phase should be independently deployable. The agent can complete Phase 1 before touching Phase 2.
### 7. Include Sample Data
Agents test better when they have example inputs and expected outputs.
---
## Running with OpenClaw
You can use OpenClaw's `sessions_spawn` to run the Ralph Wiggum pattern:
```bash
# Planning phase
sessions_spawn --task "Read PROJECT-SPEC.md in /path/to/project.
Decompose into tasks. Write IMPLEMENTATION_PLAN.md." \
--model opus
# Build iterations (spawn one at a time, or use cron)
sessions_spawn --task "Read AGENT.md in /path/to/project.
Follow the core loop. Pick ONE task, implement, test, commit." \
--model sonnet
```
Or use the bash loop directly with Claude Code:
```bash
cd /path/to/project
./ralph-loop.sh --agent claude --max 30
```