183 lines
8.0 KiB
Markdown
183 lines
8.0 KiB
Markdown
# Changelog
|
||
|
||
All notable changes to the Agent Harness project will be documented in this file.
|
||
|
||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||
|
||
---
|
||
|
||
## [2.0.0] - 2026-04-01
|
||
|
||
### The Wave-Based Management Release
|
||
|
||
Patterns developed during the Fintrove project (2026-03-31 → 2026-04-01):
|
||
4 waves, 11 streams, 44 tasks, 1,254 → 1,597 tests, zero regressions.
|
||
|
||
The key insight: **the harness was missing a planning artifact between "the spec" and "the task."**
|
||
The execution board fills that gap — a stream-level plan written entirely before any code is written.
|
||
|
||
### Added
|
||
|
||
#### New Templates
|
||
- **EXECUTION-BOARD-TEMPLATE.md** — Pre-implementation planning artifact for a stream. Defines ALL packets (goal, steps, files, known-answer tests, acceptance criteria) before any code is written. The board is the contract.
|
||
- **VALIDATION-TEMPLATE.md** — Per-packet evidence file. Written immediately after each packet completes. Records: test count delta, known-answer test results, acceptance criteria pass/fail.
|
||
- **PROCESS-EVAL-TEMPLATE.md** — Stream retrospective written after merge. Covers task sizing accuracy, test-first compliance, known-answer coverage, architecture integrity, model attribution.
|
||
|
||
#### New Guide
|
||
- **WAVE-BASED-MANAGEMENT.md** — Complete guide to the wave/stream/packet hierarchy. The plan-then-implement discipline, execution boards, known-answer tests, EXECUTION_MASTER.md pattern, wave gates, file organization.
|
||
|
||
### New Patterns Documented
|
||
|
||
#### The Plan-Then-Implement Discipline
|
||
Before writing any implementation code for a stream:
|
||
1. Write the execution board (all packets, all acceptance criteria, known-answer tests)
|
||
2. Only then: start coding
|
||
|
||
#### Known-Answer Tests
|
||
For domain-specific calculations, every module must include ≥1 test citing an official source:
|
||
```typescript
|
||
test('CPP at 70 is exactly 42% more than at 65', () => {
|
||
// Source: ESDC https://www.canada.ca/en/services/benefits/publicpensions/cpp/benefit-amount.html
|
||
expect(at70 / at65).toBeCloseTo(1.42, 5);
|
||
});
|
||
```
|
||
|
||
#### Wave Gates
|
||
Explicit checklist before Wave N+1: all streams merged, domain accuracy suite passing, process evals written, human sign-off.
|
||
|
||
#### EXECUTION_MASTER.md Pattern
|
||
Project-level dashboard: wave status, active streams, blockers, parallelism rules.
|
||
|
||
### Metrics (Fintrove, 2026-04-01)
|
||
- Waves: 4 | Streams: 11 | Tasks: 44/44
|
||
- Test growth: 1,254 → 1,597 (+343) | Regressions: 0
|
||
|
||
---
|
||
|
||
## [1.0.0] - 2024-03-18
|
||
|
||
### Added
|
||
|
||
#### Core Templates
|
||
- **AGENT-INSTRUCTIONS.md** — The agent's system prompt defining the core loop: Orient → Plan → Pick ONE task → Implement → Verify → Commit → Exit
|
||
- **PROJECT-SPEC.md** — Comprehensive template for defining projects with sections for overview, tech stack, requirements with acceptance criteria, data models, API design, constraints, phasing, and anti-patterns
|
||
- **DECISIONS.md** — Architecture Decision Record (ADR) template for documenting non-obvious technical choices and preventing agent drift
|
||
- **ralph-loop.sh** — Bash script implementing the Ralph Wiggum loop pattern: spawns fresh agent instances, checks for completion signals, restarts until done
|
||
|
||
#### Process Guides
|
||
- **SPEC-CREATION-GUIDE.md** — Complete interview protocol for creating high-quality specifications through structured conversation between human and agent. Covers vision, requirements extraction, technical discovery, constraint mapping, and spec assembly
|
||
- **PLAN-MANAGEMENT.md** — Guide for managing IMPLEMENTATION_PLAN.md as a living document. Covers task decomposition patterns, intervention strategies, progress tracking, and plan anti-patterns
|
||
- **REVIEW-AND-QA.md** — Framework for evaluating agent output. Includes review timing, quality checklists, drift detection, course-correction strategies, and review templates
|
||
- **COST-OPTIMIZATION.md** — Comprehensive guide to model billing (request-based vs token-based), optimal strategies per provider, model selection, context management, and the hybrid approach
|
||
- **OPENCLAW-INTEGRATION.md** — Running the harness in OpenClaw with sessions_spawn, cron jobs, and shell scripts. Covers model selection, monitoring, and OpenClaw-specific agent instructions
|
||
- **TROUBLESHOOTING.md** — Failure taxonomy covering five common failure modes (stuck loop, drift, overengineering, test theater, context overflow) with root causes and recovery steps
|
||
- **TUTORIAL.md** — Complete 30-minute walkthrough building a markdown link checker CLI tool from zero using the harness. Concrete, copy-pasteable example demonstrating the entire workflow
|
||
|
||
#### Examples & Documentation
|
||
- **EXAMPLES.md** — Worked example of a Fintrove-style personal finance app with complete PROJECT-SPEC.md. Compares three approaches (Ezward, Ralph Wiggum, Nate Jones) and provides best practices
|
||
- **README.md** — Project overview with file index, quick start guide, and core insights
|
||
- **PARALLEL-AGENTS.md** — Guide for running multiple agents simultaneously on independent tasks, covering parallelization strategies, work splitting, result merging, and conflict resolution
|
||
|
||
### Features
|
||
|
||
#### The Core Loop Pattern
|
||
- Stateless iteration model: each agent starts fresh with clean context
|
||
- Orient phase: agent reads spec, plan, and git history
|
||
- Single-task focus: agents complete ONE task per iteration
|
||
- Mandatory verification: build and test must pass before commit
|
||
- Promise-based signaling: `<promise>PLANNED|DONE|STUCK|ERROR</promise>`
|
||
|
||
#### Interview Protocol
|
||
- Five-phase structured interview for spec creation
|
||
- Domain knowledge extraction techniques
|
||
- Technical discovery patterns
|
||
- Constraint mapping (MUST/MUST NOT/PREFER)
|
||
- Spec quality checklist
|
||
|
||
#### Plan Management Patterns
|
||
- Scaffold-first pattern
|
||
- Vertical slice pattern
|
||
- Test-first pattern
|
||
- Dependency chain pattern
|
||
- Human intervention mechanisms (notes, task splitting, reprioritization)
|
||
|
||
#### Cost Optimization Strategies
|
||
- Request-based optimization (batch tasks, compound requests)
|
||
- Token-based optimization (fresh sub-agents, minimal context)
|
||
- Model selection by task complexity
|
||
- Hybrid strategy using multiple subscriptions
|
||
- Usage monitoring and budget allocation
|
||
|
||
#### OpenClaw Integration
|
||
- Manual orchestration via sessions_spawn
|
||
- Cron-based automation for overnight work
|
||
- Shell script orchestration
|
||
- Model selection per iteration
|
||
- Sub-agent monitoring and session history
|
||
|
||
#### Troubleshooting Framework
|
||
- Stuck loop detection and resolution
|
||
- Architecture drift prevention with ADRs
|
||
- Overengineering constraints
|
||
- Test quality validation
|
||
- Context overflow mitigation
|
||
|
||
### Documentation Quality Standards
|
||
- Comprehensive examples with real code
|
||
- Anti-pattern documentation
|
||
- Copy-pasteable templates
|
||
- Concrete acceptance criteria
|
||
- Decision record patterns
|
||
|
||
### Supported Agents
|
||
- Claude CLI (via ralph-loop.sh)
|
||
- OpenAI Codex CLI (via ralph-loop.sh)
|
||
- OpenClaw sessions_spawn (any model)
|
||
- Extensible to other agent frameworks
|
||
|
||
### Supported Workflows
|
||
- CLI loop (ralph-loop.sh)
|
||
- OpenClaw manual orchestration
|
||
- OpenClaw cron automation
|
||
- Hybrid approaches
|
||
|
||
---
|
||
|
||
## [Unreleased]
|
||
|
||
### Planned
|
||
- Additional language-specific examples (Python, Go, Rust)
|
||
- Integration templates for common CI/CD systems
|
||
- Cost calculator tool (estimate iterations × model cost)
|
||
- Spec validator (check completeness before starting)
|
||
- Template variations for different project types (API, CLI, library, web app)
|
||
|
||
---
|
||
|
||
## Version History Summary
|
||
|
||
- **1.0.0** (2024-03-18) — Initial release with complete harness system: core templates, process guides, examples, and multi-platform support
|
||
|
||
---
|
||
|
||
## Contributing
|
||
|
||
This harness is a living system. If you:
|
||
- Discover new failure modes
|
||
- Develop better patterns
|
||
- Find gaps in the guides
|
||
- Create examples for other project types
|
||
|
||
Please document them and contribute back. The harness improves as we learn what works.
|
||
|
||
---
|
||
|
||
## License
|
||
|
||
This project is released into the public domain. Use it, modify it, share it. No attribution required.
|
||
|
||
---
|
||
|
||
_The harness is 1.0 because it works. It's not 2.0 yet because we're still learning how to use it better._
|