18 KiB
Project Specification: Agent Harness System
1. Project Overview
What are we building?
The Agent Harness System is a collection of templates, scripts, and best practices for running autonomous AI-powered coding agents on complex software projects. It provides a structured framework to decompose large projects into manageable tasks, execute them iteratively with fresh agent contexts, and maintain high-quality code through mandatory testing and verification.
Why does it matter?
Traditional AI coding assistants struggle with large, multi-step projects due to context window limitations and the need for iterative refinement. The Agent Harness addresses this by providing a "Ralph Wiggum Loop" mechanism that spawns fresh agents for each task iteration, preventing context drift while maintaining project coherence through structured documentation and git-based memory.
Success criteria
- Agents can autonomously decompose complex project specs into testable tasks
- Fresh agent iterations prevent context overflow and stale reasoning
- Mandatory build/test cycles ensure code quality
- Git history serves as reliable inter-iteration memory
- System works with multiple AI agents (Claude, Codex, etc.)
- Clear signals for completion, stuck states, and errors
- Comprehensive documentation enables easy adoption
2. Technical Foundation
Tech stack
- Language: Bash (for the loop script), Markdown (for templates)
- Tools: Git, shell commands, AI agent CLIs (claude, codex)
- Build system: N/A (templates for various project types)
- Test framework: Project-specific (agents run their own tests)
- Package manager: N/A
Project structure
docs/agent-harness/
├── README.md # Quick overview and file purposes
├── AGENT-INSTRUCTIONS.md # Template for agent system prompts
├── PROJECT-SPEC.md # Template for project specifications
├── ralph-loop.sh # The loop execution script
└── EXAMPLES.md # Worked examples and best practices
Build & test commands
The harness itself doesn't have build/test commands, but agents using it must define them in their PROJECT-SPEC.md.
Coding standards
- Markdown files use consistent formatting with headers, lists, code blocks
- Bash scripts use set -euo pipefail for error handling
- Templates include clear placeholders and examples
- Documentation focuses on actionable, specific guidance
3. Requirements
Functional Requirements
FR-001: Project Specification Template
Description: A comprehensive template that captures all necessary project details for autonomous agent work.
Acceptance criteria:
- Covers project overview, technical foundation, requirements, data models
- Includes phasing for large projects
- Provides reference materials and anti-patterns
- Enables agents to work without human intervention
FR-002: Agent Instructions Template
Description: System prompt template that defines agent behavior, the core loop, and rules.
Acceptance criteria:
- Defines senior engineer role with full codebase access
- Specifies exact sequence: orient → plan → pick task → implement → verify → commit → exit
- Includes output signals for loop control ( tags)
- Enforces one-task-per-iteration rule
FR-003: Ralph Wiggum Loop Script
Description: Bash script that orchestrates agent iterations with fresh contexts.
Acceptance criteria:
- Spawns fresh agent processes each iteration
- Supports planning mode and build mode
- Monitors output signals for completion/stuck/error states
- Logs all iterations for debugging
- Configurable max iterations and agent type
FR-004: Implementation Plan Management
Description: Dynamic task decomposition and tracking system.
Acceptance criteria:
- Agents create IMPLEMENTATION_PLAN.md from project spec
- Tasks ordered by dependency with checkboxes
- Plan updated after each completed task
- Git commits preserve plan history
FR-005: Quality Assurance Integration
Description: Mandatory build and test verification in each iteration.
Acceptance criteria:
- Agents run project-specific build commands
- All tests must pass before committing
- Build failures prevent progression
- Linting enforced if configured
Non-Functional Requirements
NFR-001: Simplicity
- No complex dependencies or frameworks
- Works with standard shell and git
- Easy to copy templates into any project
- Minimal setup required
NFR-002: Reliability
- Fresh contexts prevent reasoning drift
- Git history provides audit trail
- Clear error signals for human intervention
- Handles agent failures gracefully
NFR-003: Flexibility
- Supports multiple AI agents (Claude, Codex, etc.)
- Works with various project types and tech stacks
- Configurable iteration limits and modes
- Extensible for custom workflows
4. Data Model
The Agent Harness is documentation-focused, not data-focused. The "data" is the project files themselves.
Entities
Entity: Project Spec
- Overview: what/why/success criteria
- Technical foundation: stack, structure, commands
- Requirements: functional/non-functional
- Data model: project-specific entities
- Architecture: constraints, decisions
- Phasing: optional breakdown
- References: docs, examples, anti-patterns
Entity: Implementation Plan
- Tasks: discrete, testable, dependency-ordered
- Status: checkbox per task
- Notes: agent comments on stuck tasks
- History: git commits track plan evolution
Entity: Agent Iteration
- Context: fresh read of spec + plan + git log
- Task: one unchecked item from plan
- Changes: code modifications + tests
- Verification: build + test results
- Commit: descriptive message + plan update
Relationships
- Project Spec → Implementation Plan (agent creates from spec)
- Implementation Plan → Agent Iterations (one task per iteration)
- Agent Iterations → Git Commits (each iteration commits changes)
5. API / Interface Design
The harness provides command-line interfaces:
ralph-loop.sh Commands
./ralph-loop.sh # Build mode (default)
./ralph-loop.sh plan # Planning mode
./ralph-loop.sh --max 20 # Limit iterations
./ralph-loop.sh --agent claude # Specify agent
Template Files
- PROJECT-SPEC.md: Fill with project details
- AGENT.md: Copy from AGENT-INSTRUCTIONS.md
- IMPLEMENTATION_PLAN.md: Generated by agent
Output Signals
Agents output special tags that the loop monitors:
<promise>PLANNED</promise>: Plan created<promise>DONE</promise>: All tasks complete<promise>STUCK</promise>: Needs human help<promise>ERROR</promise>: Unrecoverable error
6. Architecture Decisions
Constraints
- MUST: Use fresh agent contexts each iteration
- MUST: One task per agent iteration
- MUST: Mandatory build/test verification
- MUST NOT: Allow context compaction or memory accumulation
- PREFER: Git as the coordination mechanism
- PREFER: Simple bash orchestration over complex frameworks
Dependencies
- Git (version control)
- AI agent CLI (claude, codex, etc.)
- Shell environment (bash)
- Project-specific build tools (npm, etc.)
Known Challenges
- Context window limitations of AI agents
- Maintaining coherence across iterations
- Handling agent failures or stuck states
- Balancing specificity vs flexibility in templates
7. Phasing (Optional)
The harness itself is complete in one phase, but projects using it should phase their work.
Phase 1: Foundation
- Copy templates into project
- Fill PROJECT-SPEC.md
- Run planning mode to create IMPLEMENTATION_PLAN.md
Phase 2: Execution
- Run build iterations until completion
- Monitor for stuck/error signals
- Intervene as needed
Phase 3: Refinement
- Review final codebase
- Update templates based on lessons learned
- Document improvements for future use
8. Reference Materials
External docs
- Geoffrey Huntley's Ralph Wiggum approach
- Nate Jones task decomposition method
- Ezward's sequential PRD style
- OpenClaw sessions_spawn documentation
Existing code to learn from
- ralph-loop.sh: Clean bash scripting with error handling
- Templates: Structured markdown with clear sections
- Examples: Real-world project specifications
Anti-patterns
- Don't try to pass context between iterations
- Don't let agents work on multiple tasks simultaneously
- Don't skip build/test verification
- Don't use complex orchestration when bash loop suffices
- Don't make templates too rigid — they should be adapted per project
All Template Files and Their Roles
AGENT-INSTRUCTIONS.md
Role: System prompt template for the AI agent. Defines the senior engineer role, core workflow loop, strict rules, and output signals. Agents read this each iteration to understand their behavior.
Key Sections:
- Role definition and capabilities
- Core loop: orient → plan/pick → implement → verify → commit → exit
- Rules: one task per iteration, mandatory testing, no over-engineering
- Output signals: tags for loop control
- Context management: fresh starts with git as memory
PROJECT-SPEC.md
Role: Comprehensive project definition template. The single source of truth that agents read every iteration. Captures all requirements, constraints, and context needed for autonomous work.
Key Sections:
- Project overview (what, why, success criteria)
- Technical foundation (stack, structure, commands)
- Detailed requirements (functional + non-functional)
- Data models and API design
- Architecture decisions and constraints
- Phasing and reference materials
ralph-loop.sh
Role: Bash script implementing the Ralph Wiggum Loop mechanism. Orchestrates agent iterations, monitors completion signals, handles errors, and maintains logs.
Key Features:
- Fresh agent spawning each iteration
- Planning mode vs build mode
- Signal monitoring ( tags)
- Configurable agents and iteration limits
- Comprehensive logging
EXAMPLES.md
Role: Worked examples, comparisons of approaches, and best practices. Shows how to write good specs, compares different methodologies, and provides integration examples.
Key Content:
- Comparison of Ezward/Ralph/Nate approaches
- Complete FinPlan project spec example
- Best practices for spec writing
- OpenClaw integration examples
The Ralph Wiggum Loop Mechanism
The Ralph Wiggum Loop is named after the Simpsons character known for forgetting everything immediately, forcing fresh starts. This is the core innovation:
How It Works
-
Fresh Context Each Time: Every iteration spawns a completely new agent process with no accumulated context from previous runs.
-
Read-Only Memory: Agents rely on:
- PROJECT-SPEC.md (static requirements)
- IMPLEMENTATION_PLAN.md (current task status)
- Git log (recent changes)
- Codebase state
- Test results
-
One Task Per Iteration: Agents pick exactly one unchecked task, implement it completely, verify with build/tests, commit, and exit.
-
Signal-Based Control: Agents output tags that the bash loop monitors to determine next action.
-
Git as Coordination: Each iteration's changes are committed, creating an audit trail and allowing the next agent to see what was done.
Benefits
- Prevents context window overflow
- Eliminates stale reasoning problems
- Enables indefinite project scaling
- Provides clear intervention points
- Maintains code quality through iteration
Flow Diagram
Start Loop
├── Read PROJECT-SPEC.md
├── Run Agent with Fresh Context
├── Agent: Orient (read plan, git log)
├── Agent: Pick ONE Task
├── Agent: Implement + Verify
├── Agent: Commit + Mark Done
├── Check Output Signals
├── If DONE: Exit Success
├── If STUCK/ERROR: Exit with Warning
└── Else: Loop Again
How to Use for Autonomous Coding Workflows
Quick Start
- Copy templates into your project root
- Fill out PROJECT-SPEC.md with complete project details
- Run
./ralph-loop.sh planto generate IMPLEMENTATION_PLAN.md - Run
./ralph-loop.shto start autonomous building - Monitor progress; intervene if agent gets stuck
Detailed Workflow
-
Preparation:
- Choose project directory
- Copy all 4 template files
- Customize PROJECT-SPEC.md with your requirements
- Ensure build/test commands work
-
Planning Phase:
- Run
./ralph-loop.sh plan - Agent reads spec and creates task decomposition
- Review IMPLEMENTATION_PLAN.md for completeness
- Run
-
Build Iterations:
- Run
./ralph-loop.sh --max 50(or your preferred limit) - Each iteration: fresh agent → one task → verify → commit
- Loop continues until DONE or max iterations
- Run
-
Monitoring:
- Check
.ralph-logs/for iteration details - Look for STUCK/ERROR signals requiring intervention
- Review git log for progress
- Check
-
Intervention:
- If stuck: update IMPLEMENTATION_PLAN.md with notes
- If error: fix the issue and restart loop
- If plan needs changes: edit and restart
Configuration Options
--max N: Limit iterations (default 50)--agent claude|codex: Choose AI agentplanmode: Just create implementation plan
Examples and Use Cases
Personal Finance App (FinPlan)
Complete example in EXAMPLES.md showing:
- Privacy-first local finance dashboard
- Transaction import, categorization, projections
- Monte Carlo retirement simulations
- Tech stack: TypeScript, Express, SQLite, vanilla JS
- 15+ features decomposed into phases
Key Patterns from Examples
- Be Specific: Acceptance criteria like "Parse QFX files and extract: date, amount, payee, memo, type"
- Define Tech Stack: Don't let agents choose — specify "TypeScript, Express.js, SQLite"
- Include Data Models: Explicit entity definitions with constraints
- Phase Large Projects: Independent deployable phases
- Anti-Patterns: "Don't use localStorage — SQLite is source of truth"
Use Cases
- Complex Web Apps: Multi-feature applications with databases
- Libraries/Frameworks: API design and implementation
- Data Processing: ETL pipelines, analysis tools
- CLI Tools: Command-line utilities with multiple commands
- Prototypes to Production: Start with working prototype, iterate to full product
Integration with OpenClaw sessions_spawn
OpenClaw provides sessions_spawn for agent orchestration, offering an alternative to the bash loop.
Basic Usage
# Planning phase
sessions_spawn --task "Read PROJECT-SPEC.md. Decompose into tasks. Write IMPLEMENTATION_PLAN.md." --model opus
# Build iterations
sessions_spawn --task "Read AGENT.md. Follow core loop: pick one task, implement, test, commit." --model sonnet
Advanced Integration
- Parallel Tasks: Spawn multiple agents for independent tasks
- Different Models: Use opus for planning, sonnet for coding
- Cron Scheduling: Automate iterations with cron jobs
- Channel Output: Direct results to specific channels
Benefits Over Bash Loop
- Model selection per task type
- Parallel execution for independent work
- Integration with OpenClaw's session management
- Richer output formatting and notifications
When to Use Each
- Ralph Loop: Simple sequential projects, bash environments
- OpenClaw: Complex projects, parallel work, advanced features
Best Practices for Agent-Driven Development
Writing Project Specs
- Be Exhaustively Specific: Include exact acceptance criteria, not vague requirements
- Define Everything: Tech stack, directory structure, build commands, coding standards
- Provide Examples: Sample data, API responses, UI mockups
- Phase Appropriately: Break large projects into independent phases
- Document Constraints: What MUST/MUST NOT do, plus preferences
- Include Anti-Patterns: Lessons from previous attempts
Agent Instructions
- Role Definition: Clear capabilities and limitations
- Strict Rules: One task per iteration, mandatory testing, no refactoring unrelated code
- Clear Signals: Use tags for loop control
- Context Boundaries: Fresh start each time, rely on files/git
Loop Management
- Monitor Logs: Check .ralph-logs/ for issues
- Set Reasonable Limits: --max 20-50 iterations depending on project size
- Plan Reviews: Always review IMPLEMENTATION_PLAN.md after planning phase
- Intervention Ready: Be prepared to help when agents get stuck
Quality Assurance
- Test Everything: Unit, integration, end-to-end tests
- Build Verification: Every iteration must pass build
- Code Standards: Lint, format, document consistently
- Manual Reviews: Spot-check critical functionality
Scaling Up
- Phase Work: Complete foundations before features
- Parallel Execution: Use OpenClaw for independent tasks
- Iterative Refinement: Start with working prototype, enhance gradually
- Documentation Updates: Improve templates based on lessons learned
Common Pitfalls
- Vague Specs: Leads to agent confusion and poor decomposition
- Missing Build/Test: Code quality suffers without verification
- Context Sharing: Don't try to pass state between iterations
- Over-Parallelization: Dependencies must be respected
- Ignoring Signals: STUCK/ERROR states need attention
This system transforms AI coding assistants from helpful sidekicks into autonomous development partners capable of delivering complete, tested software projects.