365 lines
12 KiB
Markdown
365 lines
12 KiB
Markdown
# Agent Harness — Worked Examples
|
|
|
|
## How Much Context Does an Agent Need?
|
|
|
|
The key insight from both the Nate Jones approach and the Ralph Wiggum loop:
|
|
|
|
> **The agent needs enough context to work autonomously for ONE task,
|
|
> but the system needs enough structure to coordinate across MANY tasks.**
|
|
|
|
This means two layers of documentation:
|
|
|
|
### Layer 1: The Spec (written by you, read-only for agents)
|
|
- What you're building and why
|
|
- Technical constraints and decisions
|
|
- Acceptance criteria for every feature
|
|
- Data models and API shapes
|
|
|
|
### Layer 2: The Plan (created by agent, updated each iteration)
|
|
- Task decomposition with checkboxes
|
|
- Dependencies between tasks
|
|
- Current status
|
|
|
|
---
|
|
|
|
## The Three Approaches Compared
|
|
|
|
### Ezward's Approach (vibe-basic)
|
|
**Style:** Single sequential PRD — numbered steps, each building on the last.
|
|
|
|
**Strengths:**
|
|
- Very explicit about what to build in what order
|
|
- Each step includes "add unit tests" and "make sure it compiles"
|
|
- The "Generally" section at the end sets cross-cutting standards
|
|
- Language spec provided as a separate reference file
|
|
|
|
**Best for:** Well-understood problems where you know the implementation order.
|
|
|
|
**Key pattern:** The PRD *is* the implementation plan. Steps 1-14, do them in order.
|
|
|
|
### Ralph Wiggum Loop
|
|
**Style:** Spec + Plan separation. Agent creates its own plan from the spec.
|
|
|
|
**Strengths:**
|
|
- Fresh context each iteration (no context window overflow)
|
|
- Agent decomposes tasks itself (may find better ordering)
|
|
- Git history is the "memory" between iterations
|
|
- Simple bash loop — no complex orchestration
|
|
|
|
**Best for:** Larger projects where you want the agent to figure out task ordering.
|
|
|
|
**Key pattern:** `while :; do cat PROMPT.md | claude -p; done`
|
|
|
|
### Nate Jones / Task Decomposition
|
|
**Style:** Decompose → parallelize → verify → iterate.
|
|
|
|
**Strengths:**
|
|
- Multiple agents can work on different tasks simultaneously
|
|
- Verification step catches integration issues
|
|
- Iteration handles failures gracefully
|
|
|
|
**Best for:** Large projects with independent components that can be parallelized.
|
|
|
|
**Key pattern:** Orchestrator agent spawns worker agents for each task.
|
|
|
|
---
|
|
|
|
## Example: Personal Finance App (Fintrove-style)
|
|
|
|
Here's what a complete spec would look like for a Fintrove-like personal finance application. This is the document you'd give to a team of agents.
|
|
|
|
### PROJECT-SPEC.md
|
|
|
|
```markdown
|
|
# Project Specification: FinPlan — Personal Finance Dashboard
|
|
|
|
## 1. Project Overview
|
|
|
|
### What are we building?
|
|
A privacy-first personal finance dashboard that helps a retiree manage
|
|
their money. It imports transaction data, categorizes spending, projects
|
|
retirement income against expenses, and runs Monte Carlo simulations to
|
|
stress-test withdrawal strategies.
|
|
|
|
### Why does it matter?
|
|
Existing tools (Mint, YNAB) are cloud-based and sell your data. Quicken
|
|
is stagnating. We want a local-first tool that's actually useful for
|
|
retirement planning with Canadian tax rules (RRSP meltdown, CPP/OAS
|
|
optimization, pension integration).
|
|
|
|
### Success criteria
|
|
- [ ] Import Quicken QFX/CSV exports and categorize transactions
|
|
- [ ] Dashboard shows monthly spending by category (current month + trends)
|
|
- [ ] Retirement projection shows income vs expenses for 30 years
|
|
- [ ] Monte Carlo simulation with 1000+ runs using historical market data
|
|
- [ ] All data stays local (SQLite, no cloud)
|
|
- [ ] Runs in browser via local server
|
|
|
|
## 2. Technical Foundation
|
|
|
|
### Tech stack
|
|
- **Language:** TypeScript (Node.js backend, browser frontend)
|
|
- **Framework:** Express.js (API), vanilla HTML/CSS/JS (frontend)
|
|
- **Database:** SQLite via better-sqlite3
|
|
- **Build system:** esbuild for frontend bundling
|
|
- **Test framework:** Node.js built-in test runner
|
|
- **Package manager:** npm
|
|
|
|
### Project structure
|
|
project/
|
|
├── packages/
|
|
│ ├── server/ # Express API + SQLite
|
|
│ ├── client/ # Browser frontend
|
|
│ └── shared/ # Types, constants, utils
|
|
├── data/ # Sample data for testing
|
|
├── docs/ # Design docs
|
|
├── PROJECT-SPEC.md
|
|
├── IMPLEMENTATION_PLAN.md
|
|
└── AGENT.md
|
|
|
|
### Build & test commands
|
|
npm install
|
|
npm run build
|
|
npm test
|
|
npm run lint
|
|
|
|
### Coding standards
|
|
- TypeScript strict mode
|
|
- No `any` types except in test fixtures
|
|
- All public functions documented with JSDoc
|
|
- Error messages must be user-friendly (no stack traces in UI)
|
|
- SQL queries use parameterized statements (no string concatenation)
|
|
|
|
## 3. Requirements
|
|
|
|
### FR-001: Transaction Import
|
|
**Description:** Import financial transactions from QFX (OFX) and CSV files.
|
|
**Acceptance criteria:**
|
|
- [ ] Parse QFX files and extract: date, amount, payee, memo, type
|
|
- [ ] Parse CSV files with configurable column mapping
|
|
- [ ] Deduplicate transactions by date + amount + payee
|
|
- [ ] Store in SQLite with account association
|
|
- [ ] CLI command: `npm run import -- --file data/transactions.qfx`
|
|
|
|
### FR-002: Auto-Categorization
|
|
**Description:** Automatically categorize transactions based on payee patterns.
|
|
**Acceptance criteria:**
|
|
- [ ] Rule-based matching: payee contains "COSTCO" → Groceries
|
|
- [ ] Rules stored in SQLite, editable via API
|
|
- [ ] Uncategorized transactions flagged for manual review
|
|
- [ ] Bulk categorization: apply rule retroactively to past transactions
|
|
- [ ] At least 20 default rules for common Canadian merchants
|
|
|
|
### FR-003: Spending Dashboard
|
|
**Description:** Web dashboard showing spending breakdown and trends.
|
|
**Acceptance criteria:**
|
|
- [ ] Monthly spending by category (bar chart)
|
|
- [ ] 12-month trend line per category
|
|
- [ ] Total income vs total expenses per month
|
|
- [ ] Filter by date range and account
|
|
- [ ] Loads in < 500ms for 10,000 transactions
|
|
|
|
### FR-004: Retirement Projection
|
|
**Description:** Project income and expenses over a 30-year retirement.
|
|
**Acceptance criteria:**
|
|
- [ ] Input: current age, retirement age, life expectancy
|
|
- [ ] Income sources: pension (fixed), CPP (age-dependent), OAS (age-dependent)
|
|
- [ ] RRSP meltdown strategy: withdraw X/year for Y years before age 65
|
|
- [ ] Inflation adjustment (configurable rate, default 2.5%)
|
|
- [ ] Output: year-by-year table of income, expenses, portfolio balance
|
|
|
|
### FR-005: Monte Carlo Simulation
|
|
**Description:** Stress-test retirement plan against historical market returns.
|
|
**Acceptance criteria:**
|
|
- [ ] Use S&P 500 historical annual returns (1928-present)
|
|
- [ ] Run 1,000+ simulations with random return sequences
|
|
- [ ] Output: success rate (% of runs where money lasts)
|
|
- [ ] Visualization: fan chart showing percentile bands
|
|
- [ ] Compare strategies: 4% rule vs dynamic withdrawal
|
|
|
|
### NFR-001: Privacy
|
|
- [ ] All data stored locally in SQLite
|
|
- [ ] No network requests except to localhost
|
|
- [ ] No analytics, telemetry, or tracking
|
|
|
|
### NFR-002: Performance
|
|
- [ ] Dashboard loads in < 1 second
|
|
- [ ] Monte Carlo (1000 runs) completes in < 5 seconds
|
|
- [ ] Import 10,000 transactions in < 10 seconds
|
|
|
|
### NFR-003: Testing
|
|
- [ ] 80%+ code coverage
|
|
- [ ] Integration tests for API endpoints
|
|
- [ ] Unit tests for calculation functions
|
|
- [ ] Sample data fixtures for reproducible tests
|
|
|
|
## 4. Data Model
|
|
|
|
### Entities
|
|
|
|
Entity: Account
|
|
- id: INTEGER (primary key, auto-increment)
|
|
- name: TEXT (required, e.g. "RRSP", "TFSA", "Chequing")
|
|
- type: TEXT (checking | savings | investment | credit)
|
|
- institution: TEXT (optional)
|
|
|
|
Entity: Transaction
|
|
- id: INTEGER (primary key, auto-increment)
|
|
- account_id: INTEGER (foreign key → Account)
|
|
- date: TEXT (ISO 8601 date)
|
|
- amount: REAL (positive = income, negative = expense)
|
|
- payee: TEXT
|
|
- memo: TEXT (optional)
|
|
- category_id: INTEGER (foreign key → Category, nullable)
|
|
- import_hash: TEXT (unique, for deduplication)
|
|
|
|
Entity: Category
|
|
- id: INTEGER (primary key, auto-increment)
|
|
- name: TEXT (unique, e.g. "Groceries", "Utilities")
|
|
- type: TEXT (expense | income | transfer)
|
|
- budget: REAL (optional monthly budget)
|
|
|
|
Entity: CategoryRule
|
|
- id: INTEGER (primary key, auto-increment)
|
|
- pattern: TEXT (substring match on payee)
|
|
- category_id: INTEGER (foreign key → Category)
|
|
- priority: INTEGER (higher = matched first)
|
|
|
|
Entity: RetirementProfile
|
|
- id: INTEGER (primary key, auto-increment)
|
|
- name: TEXT
|
|
- current_age: INTEGER
|
|
- retirement_age: INTEGER
|
|
- life_expectancy: INTEGER
|
|
- annual_expenses: REAL
|
|
- cpp_start_age: INTEGER (default 70)
|
|
- oas_start_age: INTEGER (default 70)
|
|
- pension_annual: REAL
|
|
- rrsp_balance: REAL
|
|
- tfsa_balance: REAL
|
|
- non_reg_balance: REAL
|
|
|
|
## 5. API Design
|
|
|
|
### REST Endpoints
|
|
|
|
GET /api/accounts
|
|
POST /api/accounts
|
|
GET /api/transactions?from=&to=&account=&category=
|
|
POST /api/import (multipart file upload)
|
|
GET /api/categories
|
|
POST /api/categories
|
|
GET /api/categories/rules
|
|
POST /api/categories/rules
|
|
GET /api/spending/monthly?from=&to=
|
|
GET /api/spending/trends?months=12
|
|
GET /api/retirement/projection/:profileId
|
|
POST /api/retirement/monte-carlo/:profileId
|
|
|
|
## 6. Architecture Decisions
|
|
|
|
### Constraints
|
|
- MUST: Use SQLite (no PostgreSQL, no cloud DB)
|
|
- MUST: Run entirely on localhost
|
|
- MUST: Work offline
|
|
- MUST NOT: Make any external network requests
|
|
- MUST NOT: Use React/Vue/Angular (vanilla JS + HTML templates)
|
|
- PREFER: Native ES modules over bundling where possible
|
|
|
|
### Known Challenges
|
|
- QFX/OFX parsing is XML-based with quirky formatting
|
|
- Canadian CPP/OAS calculations have complex age-dependent rules
|
|
- Monte Carlo needs to be fast — consider Web Workers for UI
|
|
|
|
## 7. Phasing
|
|
|
|
### Phase 1: Data Foundation (Tasks 1-5)
|
|
- [ ] Project scaffolding (monorepo, build, test)
|
|
- [ ] SQLite schema + migrations
|
|
- [ ] QFX/CSV import
|
|
- [ ] Category rules engine
|
|
- [ ] REST API for CRUD
|
|
|
|
### Phase 2: Dashboard (Tasks 6-8)
|
|
- [ ] Spending by category (API + chart)
|
|
- [ ] Trend lines
|
|
- [ ] Date/account filters
|
|
|
|
### Phase 3: Retirement Engine (Tasks 9-12)
|
|
- [ ] Income projection calculator
|
|
- [ ] RRSP meltdown logic
|
|
- [ ] CPP/OAS optimization
|
|
- [ ] Monte Carlo simulation
|
|
|
|
### Phase 4: Polish (Tasks 13-15)
|
|
- [ ] Error handling + user messages
|
|
- [ ] Performance optimization
|
|
- [ ] Documentation
|
|
|
|
## 8. Reference Materials
|
|
|
|
### External docs
|
|
- QFX/OFX spec: https://www.ofx.net/
|
|
- CPP benefits: https://www.canada.ca/en/services/benefits/publicpensions/cpp.html
|
|
- OAS benefits: https://www.canada.ca/en/services/benefits/publicpensions/old-age-security.html
|
|
- S&P 500 historical returns: included in data/sp500-returns.csv
|
|
|
|
### Anti-patterns
|
|
- Don't use localStorage for data — SQLite is the source of truth
|
|
- Don't try to parse bank-specific CSV formats — use configurable column mapping
|
|
- Don't calculate CPP/OAS inline — extract to a dedicated module with unit tests
|
|
```
|
|
|
|
---
|
|
|
|
## What Makes a Good Spec?
|
|
|
|
Looking at what works across Ezward's PRD, Ralph Wiggum, and Nate Jones:
|
|
|
|
### 1. Be Specific About Acceptance Criteria
|
|
Bad: "Import transactions"
|
|
Good: "Parse QFX files and extract: date, amount, payee, memo, type. Deduplicate by date + amount + payee. Store in SQLite."
|
|
|
|
### 2. Define the Tech Stack — Don't Let the Agent Choose
|
|
Bad: "Use a modern framework"
|
|
Good: "TypeScript, Express.js, SQLite via better-sqlite3, vanilla HTML/CSS/JS frontend"
|
|
|
|
### 3. Include Data Models
|
|
Agents that know the data model write better code. Define entities, relationships, and constraints explicitly.
|
|
|
|
### 4. Provide Build/Test Commands
|
|
The agent needs to verify its own work. If it can't run `npm test`, it can't iterate.
|
|
|
|
### 5. List Anti-Patterns
|
|
Tell the agent what NOT to do. This prevents it from going down rabbit holes you've already explored.
|
|
|
|
### 6. Phase the Work
|
|
Large projects need phases. Each phase should be independently deployable. The agent can complete Phase 1 before touching Phase 2.
|
|
|
|
### 7. Include Sample Data
|
|
Agents test better when they have example inputs and expected outputs.
|
|
|
|
---
|
|
|
|
## Running with OpenClaw
|
|
|
|
You can use OpenClaw's `sessions_spawn` to run the Ralph Wiggum pattern:
|
|
|
|
```bash
|
|
# Planning phase
|
|
sessions_spawn --task "Read PROJECT-SPEC.md in /path/to/project.
|
|
Decompose into tasks. Write IMPLEMENTATION_PLAN.md." \
|
|
--model opus
|
|
|
|
# Build iterations (spawn one at a time, or use cron)
|
|
sessions_spawn --task "Read AGENT.md in /path/to/project.
|
|
Follow the core loop. Pick ONE task, implement, test, commit." \
|
|
--model sonnet
|
|
```
|
|
|
|
Or use the bash loop directly with Claude Code:
|
|
```bash
|
|
cd /path/to/project
|
|
./ralph-loop.sh --agent claude --max 30
|
|
```
|