12 KiB

Raw Blame History

Agent Harness — Worked Examples

How Much Context Does an Agent Need?

The key insight from both the Nate Jones approach and the Ralph Wiggum loop:

The agent needs enough context to work autonomously for ONE task, but the system needs enough structure to coordinate across MANY tasks.

This means two layers of documentation:

Layer 1: The Spec (written by you, read-only for agents)

What you're building and why
Technical constraints and decisions
Acceptance criteria for every feature
Data models and API shapes

Layer 2: The Plan (created by agent, updated each iteration)

Task decomposition with checkboxes
Dependencies between tasks
Current status

The Three Approaches Compared

Ezward's Approach (vibe-basic)

Style: Single sequential PRD — numbered steps, each building on the last.

Strengths:

Very explicit about what to build in what order
Each step includes "add unit tests" and "make sure it compiles"
The "Generally" section at the end sets cross-cutting standards
Language spec provided as a separate reference file

Best for: Well-understood problems where you know the implementation order.

Key pattern: The PRD is the implementation plan. Steps 1-14, do them in order.

Ralph Wiggum Loop

Style: Spec + Plan separation. Agent creates its own plan from the spec.

Strengths:

Fresh context each iteration (no context window overflow)
Agent decomposes tasks itself (may find better ordering)
Git history is the "memory" between iterations
Simple bash loop — no complex orchestration

Best for: Larger projects where you want the agent to figure out task ordering.

Key pattern: while :; do cat PROMPT.md | claude -p; done

Nate Jones / Task Decomposition

Style: Decompose → parallelize → verify → iterate.

Strengths:

Multiple agents can work on different tasks simultaneously
Verification step catches integration issues
Iteration handles failures gracefully

Best for: Large projects with independent components that can be parallelized.

Key pattern: Orchestrator agent spawns worker agents for each task.

Example: Personal Finance App (Fintrove-style)

Here's what a complete spec would look like for a Fintrove-like personal finance application. This is the document you'd give to a team of agents.

PROJECT-SPEC.md

# Project Specification: FinPlan — Personal Finance Dashboard

## 1. Project Overview

### What are we building?
A privacy-first personal finance dashboard that helps a retiree manage
their money. It imports transaction data, categorizes spending, projects
retirement income against expenses, and runs Monte Carlo simulations to
stress-test withdrawal strategies.

### Why does it matter?
Existing tools (Mint, YNAB) are cloud-based and sell your data. Quicken
is stagnating. We want a local-first tool that's actually useful for
retirement planning with Canadian tax rules (RRSP meltdown, CPP/OAS
optimization, pension integration).

### Success criteria
- [ ] Import Quicken QFX/CSV exports and categorize transactions
- [ ] Dashboard shows monthly spending by category (current month + trends)
- [ ] Retirement projection shows income vs expenses for 30 years
- [ ] Monte Carlo simulation with 1000+ runs using historical market data
- [ ] All data stays local (SQLite, no cloud)
- [ ] Runs in browser via local server

## 2. Technical Foundation

### Tech stack
- **Language:** TypeScript (Node.js backend, browser frontend)
- **Framework:** Express.js (API), vanilla HTML/CSS/JS (frontend)
- **Database:** SQLite via better-sqlite3
- **Build system:** esbuild for frontend bundling
- **Test framework:** Node.js built-in test runner
- **Package manager:** npm

### Project structure
project/
├── packages/
│   ├── server/          # Express API + SQLite
│   ├── client/          # Browser frontend
│   └── shared/          # Types, constants, utils
├── data/                # Sample data for testing
├── docs/                # Design docs
├── PROJECT-SPEC.md
├── IMPLEMENTATION_PLAN.md
└── AGENT.md

### Build & test commands
npm install
npm run build
npm test
npm run lint

### Coding standards
- TypeScript strict mode
- No `any` types except in test fixtures
- All public functions documented with JSDoc
- Error messages must be user-friendly (no stack traces in UI)
- SQL queries use parameterized statements (no string concatenation)

## 3. Requirements

### FR-001: Transaction Import
**Description:** Import financial transactions from QFX (OFX) and CSV files.
**Acceptance criteria:**
- [ ] Parse QFX files and extract: date, amount, payee, memo, type
- [ ] Parse CSV files with configurable column mapping
- [ ] Deduplicate transactions by date + amount + payee
- [ ] Store in SQLite with account association
- [ ] CLI command: `npm run import -- --file data/transactions.qfx`

### FR-002: Auto-Categorization
**Description:** Automatically categorize transactions based on payee patterns.
**Acceptance criteria:**
- [ ] Rule-based matching: payee contains "COSTCO" → Groceries
- [ ] Rules stored in SQLite, editable via API
- [ ] Uncategorized transactions flagged for manual review
- [ ] Bulk categorization: apply rule retroactively to past transactions
- [ ] At least 20 default rules for common Canadian merchants

### FR-003: Spending Dashboard
**Description:** Web dashboard showing spending breakdown and trends.
**Acceptance criteria:**
- [ ] Monthly spending by category (bar chart)
- [ ] 12-month trend line per category
- [ ] Total income vs total expenses per month
- [ ] Filter by date range and account
- [ ] Loads in < 500ms for 10,000 transactions

### FR-004: Retirement Projection
**Description:** Project income and expenses over a 30-year retirement.
**Acceptance criteria:**
- [ ] Input: current age, retirement age, life expectancy
- [ ] Income sources: pension (fixed), CPP (age-dependent), OAS (age-dependent)
- [ ] RRSP meltdown strategy: withdraw X/year for Y years before age 65
- [ ] Inflation adjustment (configurable rate, default 2.5%)
- [ ] Output: year-by-year table of income, expenses, portfolio balance

### FR-005: Monte Carlo Simulation
**Description:** Stress-test retirement plan against historical market returns.
**Acceptance criteria:**
- [ ] Use S&P 500 historical annual returns (1928-present)
- [ ] Run 1,000+ simulations with random return sequences
- [ ] Output: success rate (% of runs where money lasts)
- [ ] Visualization: fan chart showing percentile bands
- [ ] Compare strategies: 4% rule vs dynamic withdrawal

### NFR-001: Privacy
- [ ] All data stored locally in SQLite
- [ ] No network requests except to localhost
- [ ] No analytics, telemetry, or tracking

### NFR-002: Performance
- [ ] Dashboard loads in < 1 second
- [ ] Monte Carlo (1000 runs) completes in < 5 seconds
- [ ] Import 10,000 transactions in < 10 seconds

### NFR-003: Testing
- [ ] 80%+ code coverage
- [ ] Integration tests for API endpoints
- [ ] Unit tests for calculation functions
- [ ] Sample data fixtures for reproducible tests

## 4. Data Model

### Entities

Entity: Account
  - id: INTEGER (primary key, auto-increment)
  - name: TEXT (required, e.g. "RRSP", "TFSA", "Chequing")
  - type: TEXT (checking | savings | investment | credit)
  - institution: TEXT (optional)

Entity: Transaction
  - id: INTEGER (primary key, auto-increment)
  - account_id: INTEGER (foreign key → Account)
  - date: TEXT (ISO 8601 date)
  - amount: REAL (positive = income, negative = expense)
  - payee: TEXT
  - memo: TEXT (optional)
  - category_id: INTEGER (foreign key → Category, nullable)
  - import_hash: TEXT (unique, for deduplication)

Entity: Category
  - id: INTEGER (primary key, auto-increment)
  - name: TEXT (unique, e.g. "Groceries", "Utilities")
  - type: TEXT (expense | income | transfer)
  - budget: REAL (optional monthly budget)

Entity: CategoryRule
  - id: INTEGER (primary key, auto-increment)
  - pattern: TEXT (substring match on payee)
  - category_id: INTEGER (foreign key → Category)
  - priority: INTEGER (higher = matched first)

Entity: RetirementProfile
  - id: INTEGER (primary key, auto-increment)
  - name: TEXT
  - current_age: INTEGER
  - retirement_age: INTEGER
  - life_expectancy: INTEGER
  - annual_expenses: REAL
  - cpp_start_age: INTEGER (default 70)
  - oas_start_age: INTEGER (default 70)
  - pension_annual: REAL
  - rrsp_balance: REAL
  - tfsa_balance: REAL
  - non_reg_balance: REAL

## 5. API Design

### REST Endpoints

GET    /api/accounts
POST   /api/accounts
GET    /api/transactions?from=&to=&account=&category=
POST   /api/import  (multipart file upload)
GET    /api/categories
POST   /api/categories
GET    /api/categories/rules
POST   /api/categories/rules
GET    /api/spending/monthly?from=&to=
GET    /api/spending/trends?months=12
GET    /api/retirement/projection/:profileId
POST   /api/retirement/monte-carlo/:profileId

## 6. Architecture Decisions

### Constraints
- MUST: Use SQLite (no PostgreSQL, no cloud DB)
- MUST: Run entirely on localhost
- MUST: Work offline
- MUST NOT: Make any external network requests
- MUST NOT: Use React/Vue/Angular (vanilla JS + HTML templates)
- PREFER: Native ES modules over bundling where possible

### Known Challenges
- QFX/OFX parsing is XML-based with quirky formatting
- Canadian CPP/OAS calculations have complex age-dependent rules
- Monte Carlo needs to be fast — consider Web Workers for UI

## 7. Phasing

### Phase 1: Data Foundation (Tasks 1-5)
- [ ] Project scaffolding (monorepo, build, test)
- [ ] SQLite schema + migrations
- [ ] QFX/CSV import
- [ ] Category rules engine
- [ ] REST API for CRUD

### Phase 2: Dashboard (Tasks 6-8)
- [ ] Spending by category (API + chart)
- [ ] Trend lines
- [ ] Date/account filters

### Phase 3: Retirement Engine (Tasks 9-12)
- [ ] Income projection calculator
- [ ] RRSP meltdown logic
- [ ] CPP/OAS optimization
- [ ] Monte Carlo simulation

### Phase 4: Polish (Tasks 13-15)
- [ ] Error handling + user messages
- [ ] Performance optimization
- [ ] Documentation

## 8. Reference Materials

### External docs
- QFX/OFX spec: https://www.ofx.net/
- CPP benefits: https://www.canada.ca/en/services/benefits/publicpensions/cpp.html
- OAS benefits: https://www.canada.ca/en/services/benefits/publicpensions/old-age-security.html
- S&P 500 historical returns: included in data/sp500-returns.csv

### Anti-patterns
- Don't use localStorage for data — SQLite is the source of truth
- Don't try to parse bank-specific CSV formats — use configurable column mapping
- Don't calculate CPP/OAS inline — extract to a dedicated module with unit tests

What Makes a Good Spec?

Looking at what works across Ezward's PRD, Ralph Wiggum, and Nate Jones:

1. Be Specific About Acceptance Criteria

Bad: "Import transactions"
Good: "Parse QFX files and extract: date, amount, payee, memo, type. Deduplicate by date + amount + payee. Store in SQLite."

2. Define the Tech Stack — Don't Let the Agent Choose

Bad: "Use a modern framework"
Good: "TypeScript, Express.js, SQLite via better-sqlite3, vanilla HTML/CSS/JS frontend"

3. Include Data Models

Agents that know the data model write better code. Define entities, relationships, and constraints explicitly.

4. Provide Build/Test Commands

The agent needs to verify its own work. If it can't run npm test, it can't iterate.

5. List Anti-Patterns

Tell the agent what NOT to do. This prevents it from going down rabbit holes you've already explored.

6. Phase the Work

Large projects need phases. Each phase should be independently deployable. The agent can complete Phase 1 before touching Phase 2.

7. Include Sample Data

Agents test better when they have example inputs and expected outputs.

Running with OpenClaw

You can use OpenClaw's sessions_spawn to run the Ralph Wiggum pattern:

# Planning phase
sessions_spawn --task "Read PROJECT-SPEC.md in /path/to/project. 
  Decompose into tasks. Write IMPLEMENTATION_PLAN.md." \
  --model opus

# Build iterations (spawn one at a time, or use cron)
sessions_spawn --task "Read AGENT.md in /path/to/project. 
  Follow the core loop. Pick ONE task, implement, test, commit." \
  --model sonnet

Or use the bash loop directly with Claude Code:

cd /path/to/project
./ralph-loop.sh --agent claude --max 30

12 KiB Raw Blame History

Agent Harness — Worked Examples

How Much Context Does an Agent Need?

Layer 1: The Spec (written by you, read-only for agents)

Layer 2: The Plan (created by agent, updated each iteration)

The Three Approaches Compared

Ezward's Approach (vibe-basic)

Ralph Wiggum Loop

Nate Jones / Task Decomposition

Example: Personal Finance App (Fintrove-style)

PROJECT-SPEC.md

What Makes a Good Spec?

1. Be Specific About Acceptance Criteria

2. Define the Tech Stack — Don't Let the Agent Choose

3. Include Data Models

4. Provide Build/Test Commands

5. List Anti-Patterns

6. Phase the Work

7. Include Sample Data

Running with OpenClaw

12 KiB

Raw Blame History