# Changelog All notable changes to the Agent Harness project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). --- ## [2.0.0] - 2026-04-01 ### The Wave-Based Management Release Patterns developed during the Fintrove project (2026-03-31 → 2026-04-01): 4 waves, 11 streams, 44 tasks, 1,254 → 1,597 tests, zero regressions. The key insight: **the harness was missing a planning artifact between "the spec" and "the task."** The execution board fills that gap — a stream-level plan written entirely before any code is written. ### Added #### New Templates - **EXECUTION-BOARD-TEMPLATE.md** — Pre-implementation planning artifact for a stream. Defines ALL packets (goal, steps, files, known-answer tests, acceptance criteria) before any code is written. The board is the contract. - **VALIDATION-TEMPLATE.md** — Per-packet evidence file. Written immediately after each packet completes. Records: test count delta, known-answer test results, acceptance criteria pass/fail. - **PROCESS-EVAL-TEMPLATE.md** — Stream retrospective written after merge. Covers task sizing accuracy, test-first compliance, known-answer coverage, architecture integrity, model attribution. #### New Guide - **WAVE-BASED-MANAGEMENT.md** — Complete guide to the wave/stream/packet hierarchy. The plan-then-implement discipline, execution boards, known-answer tests, EXECUTION_MASTER.md pattern, wave gates, file organization. ### New Patterns Documented #### The Plan-Then-Implement Discipline Before writing any implementation code for a stream: 1. Write the execution board (all packets, all acceptance criteria, known-answer tests) 2. Only then: start coding #### Known-Answer Tests For domain-specific calculations, every module must include ≥1 test citing an official source: ```typescript test('CPP at 70 is exactly 42% more than at 65', () => { // Source: ESDC https://www.canada.ca/en/services/benefits/publicpensions/cpp/benefit-amount.html expect(at70 / at65).toBeCloseTo(1.42, 5); }); ``` #### Wave Gates Explicit checklist before Wave N+1: all streams merged, domain accuracy suite passing, process evals written, human sign-off. #### EXECUTION_MASTER.md Pattern Project-level dashboard: wave status, active streams, blockers, parallelism rules. ### Metrics (Fintrove, 2026-04-01) - Waves: 4 | Streams: 11 | Tasks: 44/44 - Test growth: 1,254 → 1,597 (+343) | Regressions: 0 --- ## [1.0.0] - 2024-03-18 ### Added #### Core Templates - **AGENT-INSTRUCTIONS.md** — The agent's system prompt defining the core loop: Orient → Plan → Pick ONE task → Implement → Verify → Commit → Exit - **PROJECT-SPEC.md** — Comprehensive template for defining projects with sections for overview, tech stack, requirements with acceptance criteria, data models, API design, constraints, phasing, and anti-patterns - **DECISIONS.md** — Architecture Decision Record (ADR) template for documenting non-obvious technical choices and preventing agent drift - **ralph-loop.sh** — Bash script implementing the Ralph Wiggum loop pattern: spawns fresh agent instances, checks for completion signals, restarts until done #### Process Guides - **SPEC-CREATION-GUIDE.md** — Complete interview protocol for creating high-quality specifications through structured conversation between human and agent. Covers vision, requirements extraction, technical discovery, constraint mapping, and spec assembly - **PLAN-MANAGEMENT.md** — Guide for managing IMPLEMENTATION_PLAN.md as a living document. Covers task decomposition patterns, intervention strategies, progress tracking, and plan anti-patterns - **REVIEW-AND-QA.md** — Framework for evaluating agent output. Includes review timing, quality checklists, drift detection, course-correction strategies, and review templates - **COST-OPTIMIZATION.md** — Comprehensive guide to model billing (request-based vs token-based), optimal strategies per provider, model selection, context management, and the hybrid approach - **OPENCLAW-INTEGRATION.md** — Running the harness in OpenClaw with sessions_spawn, cron jobs, and shell scripts. Covers model selection, monitoring, and OpenClaw-specific agent instructions - **TROUBLESHOOTING.md** — Failure taxonomy covering five common failure modes (stuck loop, drift, overengineering, test theater, context overflow) with root causes and recovery steps - **TUTORIAL.md** — Complete 30-minute walkthrough building a markdown link checker CLI tool from zero using the harness. Concrete, copy-pasteable example demonstrating the entire workflow #### Examples & Documentation - **EXAMPLES.md** — Worked example of a Fintrove-style personal finance app with complete PROJECT-SPEC.md. Compares three approaches (Ezward, Ralph Wiggum, Nate Jones) and provides best practices - **README.md** — Project overview with file index, quick start guide, and core insights - **PARALLEL-AGENTS.md** — Guide for running multiple agents simultaneously on independent tasks, covering parallelization strategies, work splitting, result merging, and conflict resolution ### Features #### The Core Loop Pattern - Stateless iteration model: each agent starts fresh with clean context - Orient phase: agent reads spec, plan, and git history - Single-task focus: agents complete ONE task per iteration - Mandatory verification: build and test must pass before commit - Promise-based signaling: `PLANNED|DONE|STUCK|ERROR` #### Interview Protocol - Five-phase structured interview for spec creation - Domain knowledge extraction techniques - Technical discovery patterns - Constraint mapping (MUST/MUST NOT/PREFER) - Spec quality checklist #### Plan Management Patterns - Scaffold-first pattern - Vertical slice pattern - Test-first pattern - Dependency chain pattern - Human intervention mechanisms (notes, task splitting, reprioritization) #### Cost Optimization Strategies - Request-based optimization (batch tasks, compound requests) - Token-based optimization (fresh sub-agents, minimal context) - Model selection by task complexity - Hybrid strategy using multiple subscriptions - Usage monitoring and budget allocation #### OpenClaw Integration - Manual orchestration via sessions_spawn - Cron-based automation for overnight work - Shell script orchestration - Model selection per iteration - Sub-agent monitoring and session history #### Troubleshooting Framework - Stuck loop detection and resolution - Architecture drift prevention with ADRs - Overengineering constraints - Test quality validation - Context overflow mitigation ### Documentation Quality Standards - Comprehensive examples with real code - Anti-pattern documentation - Copy-pasteable templates - Concrete acceptance criteria - Decision record patterns ### Supported Agents - Claude CLI (via ralph-loop.sh) - OpenAI Codex CLI (via ralph-loop.sh) - OpenClaw sessions_spawn (any model) - Extensible to other agent frameworks ### Supported Workflows - CLI loop (ralph-loop.sh) - OpenClaw manual orchestration - OpenClaw cron automation - Hybrid approaches --- ## [Unreleased] ### Planned - Additional language-specific examples (Python, Go, Rust) - Integration templates for common CI/CD systems - Cost calculator tool (estimate iterations × model cost) - Spec validator (check completeness before starting) - Template variations for different project types (API, CLI, library, web app) --- ## Version History Summary - **1.0.0** (2024-03-18) — Initial release with complete harness system: core templates, process guides, examples, and multi-platform support --- ## Contributing This harness is a living system. If you: - Discover new failure modes - Develop better patterns - Find gaps in the guides - Create examples for other project types Please document them and contribute back. The harness improves as we learn what works. --- ## License This project is released into the public domain. Use it, modify it, share it. No attribution required. --- _The harness is 1.0 because it works. It's not 2.0 yet because we're still learning how to use it better._