Wave-based agentic project methodology — plan-then-implement, execution boards, known-answer tests. v2.0
Go to file
Paul Huliganga 5db61dd321 feat: agent harness v2.0 — wave-based agentic project methodology
A complete system for running autonomous AI coding agents on complex projects.
Proven in practice: Fintrove project — 4 waves, 11 streams, 44 tasks, 1,254→1,597 tests, 0 regressions.

Core templates:
- AGENT-INSTRUCTIONS.md    — agent system prompt, core loop, commit attribution
- PROJECT-SPEC.md          — project definition template
- DECISIONS.md             — Architecture Decision Records
- EXECUTION-BOARD-TEMPLATE.md — stream planning artifact (write before coding)
- VALIDATION-TEMPLATE.md   — per-packet evidence
- PROCESS-EVAL-TEMPLATE.md — stream retrospective

Process guides:
- WAVE-BASED-MANAGEMENT.md — plan-then-implement discipline, wave gates, known-answer tests
- SPEC-CREATION-GUIDE.md   — interview protocol for building specs
- PLAN-MANAGEMENT.md       — living IMPLEMENTATION_PLAN.md
- REVIEW-AND-QA.md         — evaluating agent output
- PARALLEL-AGENTS.md       — running multiple agents simultaneously
- COST-OPTIMIZATION.md     — getting more work per dollar
- OPENCLAW-INTEGRATION.md  — sessions_spawn, cron, automation
- TROUBLESHOOTING.md       — five failure modes + recovery
- TUTORIAL.md              — 30-min hands-on walkthrough
- EXAMPLES.md              — real project examples

Tooling:
- ralph-loop.sh            — bash loop for Claude/Codex/Aider/Gemini
- model-report.ts          — per-model quality reporting from git trailers
2026-04-01 21:20:26 -04:00
archive feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
AGENT-INSTRUCTIONS.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
CHANGELOG.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
COST-OPTIMIZATION.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
DECISIONS.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
EXAMPLES.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
EXECUTION-BOARD-TEMPLATE.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
OPENCLAW-INTEGRATION.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
PARALLEL-AGENTS.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
PLAN-MANAGEMENT.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
PROCESS-EVAL-TEMPLATE.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
PROJECT-SPEC.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
README.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
REVIEW-AND-QA.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
SPEC-CREATION-GUIDE.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
TROUBLESHOOTING.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
TUTORIAL.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
VALIDATION-TEMPLATE.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
WAVE-BASED-MANAGEMENT.md feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
model-report.ts feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00
ralph-loop.sh feat: agent harness v2.0 — wave-based agentic project methodology 2026-04-01 21:20:26 -04:00

README.md

Agent Harness Templates

A complete system for running autonomous AI coding agents on complex projects.

Files

Core Templates (copy into your project)

File Purpose
AGENT-INSTRUCTIONS.md The agent's "system prompt" — reads this every iteration. Defines the core loop, mandatory pre-commit checklist (tests + TypeScript), commit attribution format, Tests-Added rule, and known anti-patterns
PROJECT-SPEC.md Template for defining your problem. Sections for: overview, tech stack, requirements with acceptance criteria, data model, API design, constraints, phasing, anti-patterns
DECISIONS.md Architecture Decision Record (ADR) template for documenting non-obvious technical choices. Prevents agent drift by creating continuity across fresh contexts
EXECUTION-BOARD-TEMPLATE.md New. Pre-implementation planning artifact for a stream. Defines ALL packets, known-answer tests, and acceptance criteria BEFORE any code is written. The core of the plan-then-implement discipline.
VALIDATION-TEMPLATE.md New. Per-packet evidence file written after each packet completes. Records test counts, known-answer results, and acceptance criteria tick-off.
PROCESS-EVAL-TEMPLATE.md New. Stream retrospective written after merge. Honest assessment of task sizing, test-first compliance, and model quality.
ralph-loop.sh The Ralph Wiggum bash loop — spawns fresh agent instances, checks for completion signals, restarts until done. Supports Claude, Codex, Aider, Gemini, and custom agents
model-report.ts Parses git log Agent: trailers to generate per-model quality table (commits, tests added, TypeScript errors). Copy to scripts/model-report.ts, add "model-report": "ts-node scripts/model-report.ts" to package.json

Process Guides (read before you start)

File Purpose
SPEC-CREATION-GUIDE.md Start here. How to create a great spec through structured interview. The interview protocol, domain knowledge extraction, and spec quality checklist
TUTORIAL.md Best way to learn. Complete 30-minute walkthrough building a markdown link checker CLI tool from zero. Concrete, copy-pasteable example of the entire workflow
WAVE-BASED-MANAGEMENT.md New. How to structure larger projects into waves, streams, and packets. The plan-then-implement discipline, execution boards, known-answer tests, and wave gates. Essential for projects with 10+ tasks.
PLAN-MANAGEMENT.md How the IMPLEMENTATION_PLAN.md works — the living document agents update. Task decomposition patterns, intervention strategies, progress tracking
REVIEW-AND-QA.md How to evaluate agent output. When to review, what to look for, how to course-correct. Review checklist template including model attribution and TypeScript hygiene checks
COST-OPTIMIZATION.md Getting more work per dollar. Request-based vs token-based billing, optimal strategies per provider, model selection guide, the hybrid strategy, anti-patterns
OPENCLAW-INTEGRATION.md Running the harness in OpenClaw with sessions_spawn, cron jobs, and shell scripts. Model selection, monitoring, cost optimization
TROUBLESHOOTING.md When things go wrong. The five failure modes (stuck loop, drift, overengineering, test theater, context overflow) and how to fix each
PARALLEL-AGENTS.md Running multiple agents simultaneously on independent tasks. When to parallelize, how to split work, how to merge results, conflict resolution, OpenClaw patterns

Examples & Reference

File Purpose
EXAMPLES.md Worked example: Fintrove-style finance app spec + comparison of three approaches (Ezward, Ralph Wiggum, Nate Jones)
CHANGELOG.md Version history and evolution of the agent harness project itself

Quick Start

New to the Harness? (Start Here)

  1. Read TUTORIAL.md — 30-minute hands-on walkthrough building a real CLI tool
  2. Read SPEC-CREATION-GUIDE.md — learn the interview protocol
  3. Try it — build your own project using the workflow

Ready to Build? (Simple project, <10 tasks)

  1. Read COST-OPTIMIZATION.md — understand your billing model before you start burning budget
  2. Interview — work with your agent to create the spec (or do it solo)
  3. Fill out PROJECT-SPEC.md with your problem definition
  4. Copy PROJECT-SPEC.md, AGENT-INSTRUCTIONS.md, and DECISIONS.md into your project root
  5. Run ./ralph-loop.sh (CLI) or use OpenClaw sessions_spawn (see OPENCLAW-INTEGRATION.md)
  6. Review at phase boundaries using REVIEW-AND-QA.md checklist
  7. Troubleshoot failures using TROUBLESHOOTING.md

Building Something Larger? (10+ tasks, multiple features)

  1. Read WAVE-BASED-MANAGEMENT.md — the plan-then-implement discipline
  2. Create your IMPLEMENTATION_PLAN.md with all tasks grouped into waves
  3. Create .harness/EXECUTION_MASTER.md — your wave/stream dashboard
  4. For each stream: copy EXECUTION-BOARD-TEMPLATE.md, fill ALL packets before coding any
  5. After each packet: copy VALIDATION-TEMPLATE.md and fill it in
  6. After each stream: copy PROCESS-EVAL-TEMPLATE.md and write the retrospective
  7. At each wave boundary: run the wave gate checklist before starting the next wave

The Core Insight

All successful agent approaches share the same loop:

Orient (read spec + plan) → Pick ONE task → Build → Test → Commit → Exit → Restart fresh

The spec defines WHAT. The plan tracks WHERE we are. Fresh context each iteration prevents drift. The human reviews and course-corrects.

See each file for detailed instructions.

When to Use Which Guide

┌─────────────────────────────────────────────────┐
│           "Which guide do I need?"               │
├─────────────────────────────────────────────────┤
│                                                  │
│  Just starting?                                  │
│    → TUTORIAL.md (hands-on learning)            │
│                                                  │
│  Creating a spec?                                │
│    → SPEC-CREATION-GUIDE.md (interview)         │
│                                                  │
│  Agent is stuck?                                 │
│    → TROUBLESHOOTING.md (failure modes)         │
│                                                  │
│  Reviewing agent output?                         │
│    → REVIEW-AND-QA.md (what to check)           │
│                                                  │
│  Worried about cost?                             │
│    → COST-OPTIMIZATION.md (billing models)      │
│                                                  │
│  Multiple independent features?                  │
│    → PARALLEL-AGENTS.md (coordination)          │
│                                                  │
│  Using OpenClaw?                                 │
│    → OPENCLAW-INTEGRATION.md (sessions_spawn)   │
│                                                  │
│  Agent keeps changing past decisions?            │
│    → DECISIONS.md (ADR template)                │
│                                                  │
│  Want to see it in action?                       │
│    → EXAMPLES.md (real project example)         │
│                                                  │
└─────────────────────────────────────────────────┘

Philosophy

Fresh Context > Long Context

Each iteration starts with a fresh agent. No accumulated confusion, no stale reasoning. The git history and plan file provide continuity.

One Task > Many Tasks

Agents that try to do everything in one session produce spaghetti. Agents that focus on ONE task produce clean commits.

Spec Quality > Agent Quality

A great spec with a mediocre agent beats a vague spec with a great agent. The spec is your leverage point.

Review > Repair

It's easier to review and guide than to debug and fix. Catch drift early through periodic reviews.

Explicit > Implicit

Agents can't read your mind. Write down constraints, anti-patterns, and decisions. What's obvious to you is invisible to the agent.

Contributing

This harness is a living system. If you:

  • Discover new failure modes
  • Develop better patterns
  • Find gaps in the guides
  • Create examples for other project types

Document them and contribute back. The harness improves as we learn what works.

Version

Current version: 2.0.0 (see CHANGELOG.md for history)

License

Public domain. Use it, modify it, share it. No attribution required.


The harness doesn't write code. It creates conditions where agents can write code reliably.