# Project Specification: Agent Harness System ## 1. Project Overview ### What are we building? The Agent Harness System is a collection of templates, scripts, and best practices for running autonomous AI-powered coding agents on complex software projects. It provides a structured framework to decompose large projects into manageable tasks, execute them iteratively with fresh agent contexts, and maintain high-quality code through mandatory testing and verification. ### Why does it matter? Traditional AI coding assistants struggle with large, multi-step projects due to context window limitations and the need for iterative refinement. The Agent Harness addresses this by providing a "Ralph Wiggum Loop" mechanism that spawns fresh agents for each task iteration, preventing context drift while maintaining project coherence through structured documentation and git-based memory. ### Success criteria - [ ] Agents can autonomously decompose complex project specs into testable tasks - [ ] Fresh agent iterations prevent context overflow and stale reasoning - [ ] Mandatory build/test cycles ensure code quality - [ ] Git history serves as reliable inter-iteration memory - [ ] System works with multiple AI agents (Claude, Codex, etc.) - [ ] Clear signals for completion, stuck states, and errors - [ ] Comprehensive documentation enables easy adoption --- ## 2. Technical Foundation ### Tech stack - **Language:** Bash (for the loop script), Markdown (for templates) - **Tools:** Git, shell commands, AI agent CLIs (claude, codex) - **Build system:** N/A (templates for various project types) - **Test framework:** Project-specific (agents run their own tests) - **Package manager:** N/A ### Project structure ``` docs/agent-harness/ ├── README.md # Quick overview and file purposes ├── AGENT-INSTRUCTIONS.md # Template for agent system prompts ├── PROJECT-SPEC.md # Template for project specifications ├── ralph-loop.sh # The loop execution script └── EXAMPLES.md # Worked examples and best practices ``` ### Build & test commands The harness itself doesn't have build/test commands, but agents using it must define them in their PROJECT-SPEC.md. ### Coding standards - Markdown files use consistent formatting with headers, lists, code blocks - Bash scripts use set -euo pipefail for error handling - Templates include clear placeholders and examples - Documentation focuses on actionable, specific guidance --- ## 3. Requirements ### Functional Requirements #### FR-001: Project Specification Template **Description:** A comprehensive template that captures all necessary project details for autonomous agent work. **Acceptance criteria:** - [ ] Covers project overview, technical foundation, requirements, data models - [ ] Includes phasing for large projects - [ ] Provides reference materials and anti-patterns - [ ] Enables agents to work without human intervention #### FR-002: Agent Instructions Template **Description:** System prompt template that defines agent behavior, the core loop, and rules. **Acceptance criteria:** - [ ] Defines senior engineer role with full codebase access - [ ] Specifies exact sequence: orient → plan → pick task → implement → verify → commit → exit - [ ] Includes output signals for loop control ( tags) - [ ] Enforces one-task-per-iteration rule #### FR-003: Ralph Wiggum Loop Script **Description:** Bash script that orchestrates agent iterations with fresh contexts. **Acceptance criteria:** - [ ] Spawns fresh agent processes each iteration - [ ] Supports planning mode and build mode - [ ] Monitors output signals for completion/stuck/error states - [ ] Logs all iterations for debugging - [ ] Configurable max iterations and agent type #### FR-004: Implementation Plan Management **Description:** Dynamic task decomposition and tracking system. **Acceptance criteria:** - [ ] Agents create IMPLEMENTATION_PLAN.md from project spec - [ ] Tasks ordered by dependency with checkboxes - [ ] Plan updated after each completed task - [ ] Git commits preserve plan history #### FR-005: Quality Assurance Integration **Description:** Mandatory build and test verification in each iteration. **Acceptance criteria:** - [ ] Agents run project-specific build commands - [ ] All tests must pass before committing - [ ] Build failures prevent progression - [ ] Linting enforced if configured ### Non-Functional Requirements #### NFR-001: Simplicity - [ ] No complex dependencies or frameworks - [ ] Works with standard shell and git - [ ] Easy to copy templates into any project - [ ] Minimal setup required #### NFR-002: Reliability - [ ] Fresh contexts prevent reasoning drift - [ ] Git history provides audit trail - [ ] Clear error signals for human intervention - [ ] Handles agent failures gracefully #### NFR-003: Flexibility - [ ] Supports multiple AI agents (Claude, Codex, etc.) - [ ] Works with various project types and tech stacks - [ ] Configurable iteration limits and modes - [ ] Extensible for custom workflows --- ## 4. Data Model The Agent Harness is documentation-focused, not data-focused. The "data" is the project files themselves. ### Entities Entity: Project Spec - Overview: what/why/success criteria - Technical foundation: stack, structure, commands - Requirements: functional/non-functional - Data model: project-specific entities - Architecture: constraints, decisions - Phasing: optional breakdown - References: docs, examples, anti-patterns Entity: Implementation Plan - Tasks: discrete, testable, dependency-ordered - Status: checkbox per task - Notes: agent comments on stuck tasks - History: git commits track plan evolution Entity: Agent Iteration - Context: fresh read of spec + plan + git log - Task: one unchecked item from plan - Changes: code modifications + tests - Verification: build + test results - Commit: descriptive message + plan update ### Relationships - Project Spec → Implementation Plan (agent creates from spec) - Implementation Plan → Agent Iterations (one task per iteration) - Agent Iterations → Git Commits (each iteration commits changes) --- ## 5. API / Interface Design The harness provides command-line interfaces: ### ralph-loop.sh Commands ```bash ./ralph-loop.sh # Build mode (default) ./ralph-loop.sh plan # Planning mode ./ralph-loop.sh --max 20 # Limit iterations ./ralph-loop.sh --agent claude # Specify agent ``` ### Template Files - PROJECT-SPEC.md: Fill with project details - AGENT.md: Copy from AGENT-INSTRUCTIONS.md - IMPLEMENTATION_PLAN.md: Generated by agent ### Output Signals Agents output special tags that the loop monitors: - `PLANNED`: Plan created - `DONE`: All tasks complete - `STUCK`: Needs human help - `ERROR`: Unrecoverable error --- ## 6. Architecture Decisions ### Constraints - MUST: Use fresh agent contexts each iteration - MUST: One task per agent iteration - MUST: Mandatory build/test verification - MUST NOT: Allow context compaction or memory accumulation - PREFER: Git as the coordination mechanism - PREFER: Simple bash orchestration over complex frameworks ### Dependencies - Git (version control) - AI agent CLI (claude, codex, etc.) - Shell environment (bash) - Project-specific build tools (npm, etc.) ### Known Challenges - Context window limitations of AI agents - Maintaining coherence across iterations - Handling agent failures or stuck states - Balancing specificity vs flexibility in templates --- ## 7. Phasing (Optional) The harness itself is complete in one phase, but projects using it should phase their work. ### Phase 1: Foundation - [ ] Copy templates into project - [ ] Fill PROJECT-SPEC.md - [ ] Run planning mode to create IMPLEMENTATION_PLAN.md ### Phase 2: Execution - [ ] Run build iterations until completion - [ ] Monitor for stuck/error signals - [ ] Intervene as needed ### Phase 3: Refinement - [ ] Review final codebase - [ ] Update templates based on lessons learned - [ ] Document improvements for future use --- ## 8. Reference Materials ### External docs - Geoffrey Huntley's Ralph Wiggum approach - Nate Jones task decomposition method - Ezward's sequential PRD style - OpenClaw sessions_spawn documentation ### Existing code to learn from - ralph-loop.sh: Clean bash scripting with error handling - Templates: Structured markdown with clear sections - Examples: Real-world project specifications ### Anti-patterns - Don't try to pass context between iterations - Don't let agents work on multiple tasks simultaneously - Don't skip build/test verification - Don't use complex orchestration when bash loop suffices - Don't make templates too rigid — they should be adapted per project --- ## All Template Files and Their Roles ### AGENT-INSTRUCTIONS.md **Role:** System prompt template for the AI agent. Defines the senior engineer role, core workflow loop, strict rules, and output signals. Agents read this each iteration to understand their behavior. **Key Sections:** - Role definition and capabilities - Core loop: orient → plan/pick → implement → verify → commit → exit - Rules: one task per iteration, mandatory testing, no over-engineering - Output signals: tags for loop control - Context management: fresh starts with git as memory ### PROJECT-SPEC.md **Role:** Comprehensive project definition template. The single source of truth that agents read every iteration. Captures all requirements, constraints, and context needed for autonomous work. **Key Sections:** - Project overview (what, why, success criteria) - Technical foundation (stack, structure, commands) - Detailed requirements (functional + non-functional) - Data models and API design - Architecture decisions and constraints - Phasing and reference materials ### ralph-loop.sh **Role:** Bash script implementing the Ralph Wiggum Loop mechanism. Orchestrates agent iterations, monitors completion signals, handles errors, and maintains logs. **Key Features:** - Fresh agent spawning each iteration - Planning mode vs build mode - Signal monitoring ( tags) - Configurable agents and iteration limits - Comprehensive logging ### EXAMPLES.md **Role:** Worked examples, comparisons of approaches, and best practices. Shows how to write good specs, compares different methodologies, and provides integration examples. **Key Content:** - Comparison of Ezward/Ralph/Nate approaches - Complete FinPlan project spec example - Best practices for spec writing - OpenClaw integration examples ## The Ralph Wiggum Loop Mechanism The Ralph Wiggum Loop is named after the Simpsons character known for forgetting everything immediately, forcing fresh starts. This is the core innovation: ### How It Works 1. **Fresh Context Each Time:** Every iteration spawns a completely new agent process with no accumulated context from previous runs. 2. **Read-Only Memory:** Agents rely on: - PROJECT-SPEC.md (static requirements) - IMPLEMENTATION_PLAN.md (current task status) - Git log (recent changes) - Codebase state - Test results 3. **One Task Per Iteration:** Agents pick exactly one unchecked task, implement it completely, verify with build/tests, commit, and exit. 4. **Signal-Based Control:** Agents output tags that the bash loop monitors to determine next action. 5. **Git as Coordination:** Each iteration's changes are committed, creating an audit trail and allowing the next agent to see what was done. ### Benefits - Prevents context window overflow - Eliminates stale reasoning problems - Enables indefinite project scaling - Provides clear intervention points - Maintains code quality through iteration ### Flow Diagram ``` Start Loop ├── Read PROJECT-SPEC.md ├── Run Agent with Fresh Context ├── Agent: Orient (read plan, git log) ├── Agent: Pick ONE Task ├── Agent: Implement + Verify ├── Agent: Commit + Mark Done ├── Check Output Signals ├── If DONE: Exit Success ├── If STUCK/ERROR: Exit with Warning └── Else: Loop Again ``` ## How to Use for Autonomous Coding Workflows ### Quick Start 1. Copy templates into your project root 2. Fill out PROJECT-SPEC.md with complete project details 3. Run `./ralph-loop.sh plan` to generate IMPLEMENTATION_PLAN.md 4. Run `./ralph-loop.sh` to start autonomous building 5. Monitor progress; intervene if agent gets stuck ### Detailed Workflow 1. **Preparation:** - Choose project directory - Copy all 4 template files - Customize PROJECT-SPEC.md with your requirements - Ensure build/test commands work 2. **Planning Phase:** - Run `./ralph-loop.sh plan` - Agent reads spec and creates task decomposition - Review IMPLEMENTATION_PLAN.md for completeness 3. **Build Iterations:** - Run `./ralph-loop.sh --max 50` (or your preferred limit) - Each iteration: fresh agent → one task → verify → commit - Loop continues until DONE or max iterations 4. **Monitoring:** - Check `.ralph-logs/` for iteration details - Look for STUCK/ERROR signals requiring intervention - Review git log for progress 5. **Intervention:** - If stuck: update IMPLEMENTATION_PLAN.md with notes - If error: fix the issue and restart loop - If plan needs changes: edit and restart ### Configuration Options - `--max N`: Limit iterations (default 50) - `--agent claude|codex`: Choose AI agent - `plan` mode: Just create implementation plan ## Examples and Use Cases ### Personal Finance App (FinPlan) Complete example in EXAMPLES.md showing: - Privacy-first local finance dashboard - Transaction import, categorization, projections - Monte Carlo retirement simulations - Tech stack: TypeScript, Express, SQLite, vanilla JS - 15+ features decomposed into phases ### Key Patterns from Examples - **Be Specific:** Acceptance criteria like "Parse QFX files and extract: date, amount, payee, memo, type" - **Define Tech Stack:** Don't let agents choose — specify "TypeScript, Express.js, SQLite" - **Include Data Models:** Explicit entity definitions with constraints - **Phase Large Projects:** Independent deployable phases - **Anti-Patterns:** "Don't use localStorage — SQLite is source of truth" ### Use Cases - **Complex Web Apps:** Multi-feature applications with databases - **Libraries/Frameworks:** API design and implementation - **Data Processing:** ETL pipelines, analysis tools - **CLI Tools:** Command-line utilities with multiple commands - **Prototypes to Production:** Start with working prototype, iterate to full product ## Integration with OpenClaw sessions_spawn OpenClaw provides `sessions_spawn` for agent orchestration, offering an alternative to the bash loop. ### Basic Usage ```bash # Planning phase sessions_spawn --task "Read PROJECT-SPEC.md. Decompose into tasks. Write IMPLEMENTATION_PLAN.md." --model opus # Build iterations sessions_spawn --task "Read AGENT.md. Follow core loop: pick one task, implement, test, commit." --model sonnet ``` ### Advanced Integration - **Parallel Tasks:** Spawn multiple agents for independent tasks - **Different Models:** Use opus for planning, sonnet for coding - **Cron Scheduling:** Automate iterations with cron jobs - **Channel Output:** Direct results to specific channels ### Benefits Over Bash Loop - Model selection per task type - Parallel execution for independent work - Integration with OpenClaw's session management - Richer output formatting and notifications ### When to Use Each - **Ralph Loop:** Simple sequential projects, bash environments - **OpenClaw:** Complex projects, parallel work, advanced features ## Best Practices for Agent-Driven Development ### Writing Project Specs 1. **Be Exhaustively Specific:** Include exact acceptance criteria, not vague requirements 2. **Define Everything:** Tech stack, directory structure, build commands, coding standards 3. **Provide Examples:** Sample data, API responses, UI mockups 4. **Phase Appropriately:** Break large projects into independent phases 5. **Document Constraints:** What MUST/MUST NOT do, plus preferences 6. **Include Anti-Patterns:** Lessons from previous attempts ### Agent Instructions 1. **Role Definition:** Clear capabilities and limitations 2. **Strict Rules:** One task per iteration, mandatory testing, no refactoring unrelated code 3. **Clear Signals:** Use tags for loop control 4. **Context Boundaries:** Fresh start each time, rely on files/git ### Loop Management 1. **Monitor Logs:** Check .ralph-logs/ for issues 2. **Set Reasonable Limits:** --max 20-50 iterations depending on project size 3. **Plan Reviews:** Always review IMPLEMENTATION_PLAN.md after planning phase 4. **Intervention Ready:** Be prepared to help when agents get stuck ### Quality Assurance 1. **Test Everything:** Unit, integration, end-to-end tests 2. **Build Verification:** Every iteration must pass build 3. **Code Standards:** Lint, format, document consistently 4. **Manual Reviews:** Spot-check critical functionality ### Scaling Up 1. **Phase Work:** Complete foundations before features 2. **Parallel Execution:** Use OpenClaw for independent tasks 3. **Iterative Refinement:** Start with working prototype, enhance gradually 4. **Documentation Updates:** Improve templates based on lessons learned ### Common Pitfalls - **Vague Specs:** Leads to agent confusion and poor decomposition - **Missing Build/Test:** Code quality suffers without verification - **Context Sharing:** Don't try to pass state between iterations - **Over-Parallelization:** Dependencies must be respected - **Ignoring Signals:** STUCK/ERROR states need attention This system transforms AI coding assistants from helpful sidekicks into autonomous development partners capable of delivering complete, tested software projects.