9.0 KiB
Agent Harness Templates
A complete system for running autonomous AI coding agents on complex projects.
Files
Core Templates (copy into your project)
| File | Purpose |
|---|---|
AGENT-INSTRUCTIONS.md |
The agent's "system prompt" — reads this every iteration. Defines the core loop, mandatory pre-commit checklist (tests + TypeScript), commit attribution format, Tests-Added rule, and known anti-patterns |
PROJECT-SPEC.md |
Template for defining your problem. Sections for: overview, tech stack, requirements with acceptance criteria, data model, API design, constraints, phasing, anti-patterns |
DECISIONS.md |
Architecture Decision Record (ADR) template for documenting non-obvious technical choices. Prevents agent drift by creating continuity across fresh contexts |
EXECUTION-BOARD-TEMPLATE.md |
⭐ New. Pre-implementation planning artifact for a stream. Defines ALL packets, known-answer tests, and acceptance criteria BEFORE any code is written. The core of the plan-then-implement discipline. |
VALIDATION-TEMPLATE.md |
⭐ New. Per-packet evidence file written after each packet completes. Records test counts, known-answer results, and acceptance criteria tick-off. |
PROCESS-EVAL-TEMPLATE.md |
⭐ New. Stream retrospective written after merge. Honest assessment of task sizing, test-first compliance, and model quality. |
ralph-loop.sh |
The Ralph Wiggum bash loop — spawns fresh agent instances, checks for completion signals, restarts until done. Supports Claude, Codex, Aider, Gemini, and custom agents |
model-report.ts |
Parses git log Agent: trailers to generate per-model quality table (commits, tests added, TypeScript errors). Copy to scripts/model-report.ts, add "model-report": "ts-node scripts/model-report.ts" to package.json |
Process Guides (read before you start)
| File | Purpose |
|---|---|
SPEC-CREATION-GUIDE.md |
Start here. How to create a great spec through structured interview. The interview protocol, domain knowledge extraction, and spec quality checklist |
TUTORIAL.md |
Best way to learn. Complete 30-minute walkthrough building a markdown link checker CLI tool from zero. Concrete, copy-pasteable example of the entire workflow |
WAVE-BASED-MANAGEMENT.md |
⭐ New. How to structure larger projects into waves, streams, and packets. The plan-then-implement discipline, execution boards, known-answer tests, and wave gates. Essential for projects with 10+ tasks. |
PLAN-MANAGEMENT.md |
How the IMPLEMENTATION_PLAN.md works — the living document agents update. Task decomposition patterns, intervention strategies, progress tracking |
REVIEW-AND-QA.md |
How to evaluate agent output. When to review, what to look for, how to course-correct. Review checklist template including model attribution and TypeScript hygiene checks |
COST-OPTIMIZATION.md |
Getting more work per dollar. Request-based vs token-based billing, optimal strategies per provider, model selection guide, the hybrid strategy, anti-patterns |
OPENCLAW-INTEGRATION.md |
Running the harness in OpenClaw with sessions_spawn, cron jobs, and shell scripts. Model selection, monitoring, cost optimization |
TROUBLESHOOTING.md |
When things go wrong. The five failure modes (stuck loop, drift, overengineering, test theater, context overflow) and how to fix each |
PARALLEL-AGENTS.md |
Running multiple agents simultaneously on independent tasks. When to parallelize, how to split work, how to merge results, conflict resolution, OpenClaw patterns |
Examples & Reference
| File | Purpose |
|---|---|
EXAMPLES.md |
Worked example: Fintrove-style finance app spec + comparison of three approaches (Ezward, Ralph Wiggum, Nate Jones) |
CHANGELOG.md |
Version history and evolution of the agent harness project itself |
Quick Start
New to the Harness? (Start Here)
- Read
TUTORIAL.md— 30-minute hands-on walkthrough building a real CLI tool - Read
SPEC-CREATION-GUIDE.md— learn the interview protocol - Try it — build your own project using the workflow
Ready to Build? (Simple project, <10 tasks)
- Read
COST-OPTIMIZATION.md— understand your billing model before you start burning budget - Interview — work with your agent to create the spec (or do it solo)
- Fill out
PROJECT-SPEC.mdwith your problem definition - Copy
PROJECT-SPEC.md,AGENT-INSTRUCTIONS.md, andDECISIONS.mdinto your project root - Run
./ralph-loop.sh(CLI) or use OpenClaw sessions_spawn (seeOPENCLAW-INTEGRATION.md) - Review at phase boundaries using
REVIEW-AND-QA.mdchecklist - Troubleshoot failures using
TROUBLESHOOTING.md
Building Something Larger? (10+ tasks, multiple features)
- Read
WAVE-BASED-MANAGEMENT.md— the plan-then-implement discipline - Create your
IMPLEMENTATION_PLAN.mdwith all tasks grouped into waves - Create
.harness/EXECUTION_MASTER.md— your wave/stream dashboard - For each stream: copy
EXECUTION-BOARD-TEMPLATE.md, fill ALL packets before coding any - After each packet: copy
VALIDATION-TEMPLATE.mdand fill it in - After each stream: copy
PROCESS-EVAL-TEMPLATE.mdand write the retrospective - At each wave boundary: run the wave gate checklist before starting the next wave
The Core Insight
All successful agent approaches share the same loop:
Orient (read spec + plan) → Pick ONE task → Build → Test → Commit → Exit → Restart fresh
The spec defines WHAT. The plan tracks WHERE we are. Fresh context each iteration prevents drift. The human reviews and course-corrects.
See each file for detailed instructions.
When to Use Which Guide
┌─────────────────────────────────────────────────┐
│ "Which guide do I need?" │
├─────────────────────────────────────────────────┤
│ │
│ Just starting? │
│ → TUTORIAL.md (hands-on learning) │
│ │
│ Creating a spec? │
│ → SPEC-CREATION-GUIDE.md (interview) │
│ │
│ Agent is stuck? │
│ → TROUBLESHOOTING.md (failure modes) │
│ │
│ Reviewing agent output? │
│ → REVIEW-AND-QA.md (what to check) │
│ │
│ Worried about cost? │
│ → COST-OPTIMIZATION.md (billing models) │
│ │
│ Multiple independent features? │
│ → PARALLEL-AGENTS.md (coordination) │
│ │
│ Using OpenClaw? │
│ → OPENCLAW-INTEGRATION.md (sessions_spawn) │
│ │
│ Agent keeps changing past decisions? │
│ → DECISIONS.md (ADR template) │
│ │
│ Want to see it in action? │
│ → EXAMPLES.md (real project example) │
│ │
└─────────────────────────────────────────────────┘
Philosophy
Fresh Context > Long Context
Each iteration starts with a fresh agent. No accumulated confusion, no stale reasoning. The git history and plan file provide continuity.
One Task > Many Tasks
Agents that try to do everything in one session produce spaghetti. Agents that focus on ONE task produce clean commits.
Spec Quality > Agent Quality
A great spec with a mediocre agent beats a vague spec with a great agent. The spec is your leverage point.
Review > Repair
It's easier to review and guide than to debug and fix. Catch drift early through periodic reviews.
Explicit > Implicit
Agents can't read your mind. Write down constraints, anti-patterns, and decisions. What's obvious to you is invisible to the agent.
Contributing
This harness is a living system. If you:
- Discover new failure modes
- Develop better patterns
- Find gaps in the guides
- Create examples for other project types
Document them and contribute back. The harness improves as we learn what works.
Version
Current version: 2.0.0 (see CHANGELOG.md for history)
License
Public domain. Use it, modify it, share it. No attribution required.
The harness doesn't write code. It creates conditions where agents can write code reliably.