# Agent Harness — Worked Examples ## How Much Context Does an Agent Need? The key insight from both the Nate Jones approach and the Ralph Wiggum loop: > **The agent needs enough context to work autonomously for ONE task, > but the system needs enough structure to coordinate across MANY tasks.** This means two layers of documentation: ### Layer 1: The Spec (written by you, read-only for agents) - What you're building and why - Technical constraints and decisions - Acceptance criteria for every feature - Data models and API shapes ### Layer 2: The Plan (created by agent, updated each iteration) - Task decomposition with checkboxes - Dependencies between tasks - Current status --- ## The Three Approaches Compared ### Ezward's Approach (vibe-basic) **Style:** Single sequential PRD — numbered steps, each building on the last. **Strengths:** - Very explicit about what to build in what order - Each step includes "add unit tests" and "make sure it compiles" - The "Generally" section at the end sets cross-cutting standards - Language spec provided as a separate reference file **Best for:** Well-understood problems where you know the implementation order. **Key pattern:** The PRD *is* the implementation plan. Steps 1-14, do them in order. ### Ralph Wiggum Loop **Style:** Spec + Plan separation. Agent creates its own plan from the spec. **Strengths:** - Fresh context each iteration (no context window overflow) - Agent decomposes tasks itself (may find better ordering) - Git history is the "memory" between iterations - Simple bash loop — no complex orchestration **Best for:** Larger projects where you want the agent to figure out task ordering. **Key pattern:** `while :; do cat PROMPT.md | claude -p; done` ### Nate Jones / Task Decomposition **Style:** Decompose → parallelize → verify → iterate. **Strengths:** - Multiple agents can work on different tasks simultaneously - Verification step catches integration issues - Iteration handles failures gracefully **Best for:** Large projects with independent components that can be parallelized. **Key pattern:** Orchestrator agent spawns worker agents for each task. --- ## Example: Personal Finance App (Fintrove-style) Here's what a complete spec would look like for a Fintrove-like personal finance application. This is the document you'd give to a team of agents. ### PROJECT-SPEC.md ```markdown # Project Specification: FinPlan — Personal Finance Dashboard ## 1. Project Overview ### What are we building? A privacy-first personal finance dashboard that helps a retiree manage their money. It imports transaction data, categorizes spending, projects retirement income against expenses, and runs Monte Carlo simulations to stress-test withdrawal strategies. ### Why does it matter? Existing tools (Mint, YNAB) are cloud-based and sell your data. Quicken is stagnating. We want a local-first tool that's actually useful for retirement planning with Canadian tax rules (RRSP meltdown, CPP/OAS optimization, pension integration). ### Success criteria - [ ] Import Quicken QFX/CSV exports and categorize transactions - [ ] Dashboard shows monthly spending by category (current month + trends) - [ ] Retirement projection shows income vs expenses for 30 years - [ ] Monte Carlo simulation with 1000+ runs using historical market data - [ ] All data stays local (SQLite, no cloud) - [ ] Runs in browser via local server ## 2. Technical Foundation ### Tech stack - **Language:** TypeScript (Node.js backend, browser frontend) - **Framework:** Express.js (API), vanilla HTML/CSS/JS (frontend) - **Database:** SQLite via better-sqlite3 - **Build system:** esbuild for frontend bundling - **Test framework:** Node.js built-in test runner - **Package manager:** npm ### Project structure project/ ├── packages/ │ ├── server/ # Express API + SQLite │ ├── client/ # Browser frontend │ └── shared/ # Types, constants, utils ├── data/ # Sample data for testing ├── docs/ # Design docs ├── PROJECT-SPEC.md ├── IMPLEMENTATION_PLAN.md └── AGENT.md ### Build & test commands npm install npm run build npm test npm run lint ### Coding standards - TypeScript strict mode - No `any` types except in test fixtures - All public functions documented with JSDoc - Error messages must be user-friendly (no stack traces in UI) - SQL queries use parameterized statements (no string concatenation) ## 3. Requirements ### FR-001: Transaction Import **Description:** Import financial transactions from QFX (OFX) and CSV files. **Acceptance criteria:** - [ ] Parse QFX files and extract: date, amount, payee, memo, type - [ ] Parse CSV files with configurable column mapping - [ ] Deduplicate transactions by date + amount + payee - [ ] Store in SQLite with account association - [ ] CLI command: `npm run import -- --file data/transactions.qfx` ### FR-002: Auto-Categorization **Description:** Automatically categorize transactions based on payee patterns. **Acceptance criteria:** - [ ] Rule-based matching: payee contains "COSTCO" → Groceries - [ ] Rules stored in SQLite, editable via API - [ ] Uncategorized transactions flagged for manual review - [ ] Bulk categorization: apply rule retroactively to past transactions - [ ] At least 20 default rules for common Canadian merchants ### FR-003: Spending Dashboard **Description:** Web dashboard showing spending breakdown and trends. **Acceptance criteria:** - [ ] Monthly spending by category (bar chart) - [ ] 12-month trend line per category - [ ] Total income vs total expenses per month - [ ] Filter by date range and account - [ ] Loads in < 500ms for 10,000 transactions ### FR-004: Retirement Projection **Description:** Project income and expenses over a 30-year retirement. **Acceptance criteria:** - [ ] Input: current age, retirement age, life expectancy - [ ] Income sources: pension (fixed), CPP (age-dependent), OAS (age-dependent) - [ ] RRSP meltdown strategy: withdraw X/year for Y years before age 65 - [ ] Inflation adjustment (configurable rate, default 2.5%) - [ ] Output: year-by-year table of income, expenses, portfolio balance ### FR-005: Monte Carlo Simulation **Description:** Stress-test retirement plan against historical market returns. **Acceptance criteria:** - [ ] Use S&P 500 historical annual returns (1928-present) - [ ] Run 1,000+ simulations with random return sequences - [ ] Output: success rate (% of runs where money lasts) - [ ] Visualization: fan chart showing percentile bands - [ ] Compare strategies: 4% rule vs dynamic withdrawal ### NFR-001: Privacy - [ ] All data stored locally in SQLite - [ ] No network requests except to localhost - [ ] No analytics, telemetry, or tracking ### NFR-002: Performance - [ ] Dashboard loads in < 1 second - [ ] Monte Carlo (1000 runs) completes in < 5 seconds - [ ] Import 10,000 transactions in < 10 seconds ### NFR-003: Testing - [ ] 80%+ code coverage - [ ] Integration tests for API endpoints - [ ] Unit tests for calculation functions - [ ] Sample data fixtures for reproducible tests ## 4. Data Model ### Entities Entity: Account - id: INTEGER (primary key, auto-increment) - name: TEXT (required, e.g. "RRSP", "TFSA", "Chequing") - type: TEXT (checking | savings | investment | credit) - institution: TEXT (optional) Entity: Transaction - id: INTEGER (primary key, auto-increment) - account_id: INTEGER (foreign key → Account) - date: TEXT (ISO 8601 date) - amount: REAL (positive = income, negative = expense) - payee: TEXT - memo: TEXT (optional) - category_id: INTEGER (foreign key → Category, nullable) - import_hash: TEXT (unique, for deduplication) Entity: Category - id: INTEGER (primary key, auto-increment) - name: TEXT (unique, e.g. "Groceries", "Utilities") - type: TEXT (expense | income | transfer) - budget: REAL (optional monthly budget) Entity: CategoryRule - id: INTEGER (primary key, auto-increment) - pattern: TEXT (substring match on payee) - category_id: INTEGER (foreign key → Category) - priority: INTEGER (higher = matched first) Entity: RetirementProfile - id: INTEGER (primary key, auto-increment) - name: TEXT - current_age: INTEGER - retirement_age: INTEGER - life_expectancy: INTEGER - annual_expenses: REAL - cpp_start_age: INTEGER (default 70) - oas_start_age: INTEGER (default 70) - pension_annual: REAL - rrsp_balance: REAL - tfsa_balance: REAL - non_reg_balance: REAL ## 5. API Design ### REST Endpoints GET /api/accounts POST /api/accounts GET /api/transactions?from=&to=&account=&category= POST /api/import (multipart file upload) GET /api/categories POST /api/categories GET /api/categories/rules POST /api/categories/rules GET /api/spending/monthly?from=&to= GET /api/spending/trends?months=12 GET /api/retirement/projection/:profileId POST /api/retirement/monte-carlo/:profileId ## 6. Architecture Decisions ### Constraints - MUST: Use SQLite (no PostgreSQL, no cloud DB) - MUST: Run entirely on localhost - MUST: Work offline - MUST NOT: Make any external network requests - MUST NOT: Use React/Vue/Angular (vanilla JS + HTML templates) - PREFER: Native ES modules over bundling where possible ### Known Challenges - QFX/OFX parsing is XML-based with quirky formatting - Canadian CPP/OAS calculations have complex age-dependent rules - Monte Carlo needs to be fast — consider Web Workers for UI ## 7. Phasing ### Phase 1: Data Foundation (Tasks 1-5) - [ ] Project scaffolding (monorepo, build, test) - [ ] SQLite schema + migrations - [ ] QFX/CSV import - [ ] Category rules engine - [ ] REST API for CRUD ### Phase 2: Dashboard (Tasks 6-8) - [ ] Spending by category (API + chart) - [ ] Trend lines - [ ] Date/account filters ### Phase 3: Retirement Engine (Tasks 9-12) - [ ] Income projection calculator - [ ] RRSP meltdown logic - [ ] CPP/OAS optimization - [ ] Monte Carlo simulation ### Phase 4: Polish (Tasks 13-15) - [ ] Error handling + user messages - [ ] Performance optimization - [ ] Documentation ## 8. Reference Materials ### External docs - QFX/OFX spec: https://www.ofx.net/ - CPP benefits: https://www.canada.ca/en/services/benefits/publicpensions/cpp.html - OAS benefits: https://www.canada.ca/en/services/benefits/publicpensions/old-age-security.html - S&P 500 historical returns: included in data/sp500-returns.csv ### Anti-patterns - Don't use localStorage for data — SQLite is the source of truth - Don't try to parse bank-specific CSV formats — use configurable column mapping - Don't calculate CPP/OAS inline — extract to a dedicated module with unit tests ``` --- ## What Makes a Good Spec? Looking at what works across Ezward's PRD, Ralph Wiggum, and Nate Jones: ### 1. Be Specific About Acceptance Criteria Bad: "Import transactions" Good: "Parse QFX files and extract: date, amount, payee, memo, type. Deduplicate by date + amount + payee. Store in SQLite." ### 2. Define the Tech Stack — Don't Let the Agent Choose Bad: "Use a modern framework" Good: "TypeScript, Express.js, SQLite via better-sqlite3, vanilla HTML/CSS/JS frontend" ### 3. Include Data Models Agents that know the data model write better code. Define entities, relationships, and constraints explicitly. ### 4. Provide Build/Test Commands The agent needs to verify its own work. If it can't run `npm test`, it can't iterate. ### 5. List Anti-Patterns Tell the agent what NOT to do. This prevents it from going down rabbit holes you've already explored. ### 6. Phase the Work Large projects need phases. Each phase should be independently deployable. The agent can complete Phase 1 before touching Phase 2. ### 7. Include Sample Data Agents test better when they have example inputs and expected outputs. --- ## Running with OpenClaw You can use OpenClaw's `sessions_spawn` to run the Ralph Wiggum pattern: ```bash # Planning phase sessions_spawn --task "Read PROJECT-SPEC.md in /path/to/project. Decompose into tasks. Write IMPLEMENTATION_PLAN.md." \ --model opus # Build iterations (spawn one at a time, or use cron) sessions_spawn --task "Read AGENT.md in /path/to/project. Follow the core loop. Pick ONE task, implement, test, commit." \ --model sonnet ``` Or use the bash loop directly with Claude Code: ```bash cd /path/to/project ./ralph-loop.sh --agent claude --max 30 ```