agent-harness/archive/Agent-Harness-Project-spec-...

18 KiB

Project Specification: Agent Harness System

1. Project Overview

What are we building?

The Agent Harness System is a collection of templates, scripts, and best practices for running autonomous AI-powered coding agents on complex software projects. It provides a structured framework to decompose large projects into manageable tasks, execute them iteratively with fresh agent contexts, and maintain high-quality code through mandatory testing and verification.

Why does it matter?

Traditional AI coding assistants struggle with large, multi-step projects due to context window limitations and the need for iterative refinement. The Agent Harness addresses this by providing a "Ralph Wiggum Loop" mechanism that spawns fresh agents for each task iteration, preventing context drift while maintaining project coherence through structured documentation and git-based memory.

Success criteria

  • Agents can autonomously decompose complex project specs into testable tasks
  • Fresh agent iterations prevent context overflow and stale reasoning
  • Mandatory build/test cycles ensure code quality
  • Git history serves as reliable inter-iteration memory
  • System works with multiple AI agents (Claude, Codex, etc.)
  • Clear signals for completion, stuck states, and errors
  • Comprehensive documentation enables easy adoption

2. Technical Foundation

Tech stack

  • Language: Bash (for the loop script), Markdown (for templates)
  • Tools: Git, shell commands, AI agent CLIs (claude, codex)
  • Build system: N/A (templates for various project types)
  • Test framework: Project-specific (agents run their own tests)
  • Package manager: N/A

Project structure

docs/agent-harness/
├── README.md              # Quick overview and file purposes
├── AGENT-INSTRUCTIONS.md  # Template for agent system prompts
├── PROJECT-SPEC.md        # Template for project specifications
├── ralph-loop.sh          # The loop execution script
└── EXAMPLES.md            # Worked examples and best practices

Build & test commands

The harness itself doesn't have build/test commands, but agents using it must define them in their PROJECT-SPEC.md.

Coding standards

  • Markdown files use consistent formatting with headers, lists, code blocks
  • Bash scripts use set -euo pipefail for error handling
  • Templates include clear placeholders and examples
  • Documentation focuses on actionable, specific guidance

3. Requirements

Functional Requirements

FR-001: Project Specification Template

Description: A comprehensive template that captures all necessary project details for autonomous agent work.
Acceptance criteria:

  • Covers project overview, technical foundation, requirements, data models
  • Includes phasing for large projects
  • Provides reference materials and anti-patterns
  • Enables agents to work without human intervention

FR-002: Agent Instructions Template

Description: System prompt template that defines agent behavior, the core loop, and rules.
Acceptance criteria:

  • Defines senior engineer role with full codebase access
  • Specifies exact sequence: orient → plan → pick task → implement → verify → commit → exit
  • Includes output signals for loop control ( tags)
  • Enforces one-task-per-iteration rule

FR-003: Ralph Wiggum Loop Script

Description: Bash script that orchestrates agent iterations with fresh contexts.
Acceptance criteria:

  • Spawns fresh agent processes each iteration
  • Supports planning mode and build mode
  • Monitors output signals for completion/stuck/error states
  • Logs all iterations for debugging
  • Configurable max iterations and agent type

FR-004: Implementation Plan Management

Description: Dynamic task decomposition and tracking system.
Acceptance criteria:

  • Agents create IMPLEMENTATION_PLAN.md from project spec
  • Tasks ordered by dependency with checkboxes
  • Plan updated after each completed task
  • Git commits preserve plan history

FR-005: Quality Assurance Integration

Description: Mandatory build and test verification in each iteration.
Acceptance criteria:

  • Agents run project-specific build commands
  • All tests must pass before committing
  • Build failures prevent progression
  • Linting enforced if configured

Non-Functional Requirements

NFR-001: Simplicity

  • No complex dependencies or frameworks
  • Works with standard shell and git
  • Easy to copy templates into any project
  • Minimal setup required

NFR-002: Reliability

  • Fresh contexts prevent reasoning drift
  • Git history provides audit trail
  • Clear error signals for human intervention
  • Handles agent failures gracefully

NFR-003: Flexibility

  • Supports multiple AI agents (Claude, Codex, etc.)
  • Works with various project types and tech stacks
  • Configurable iteration limits and modes
  • Extensible for custom workflows

4. Data Model

The Agent Harness is documentation-focused, not data-focused. The "data" is the project files themselves.

Entities

Entity: Project Spec

  • Overview: what/why/success criteria
  • Technical foundation: stack, structure, commands
  • Requirements: functional/non-functional
  • Data model: project-specific entities
  • Architecture: constraints, decisions
  • Phasing: optional breakdown
  • References: docs, examples, anti-patterns

Entity: Implementation Plan

  • Tasks: discrete, testable, dependency-ordered
  • Status: checkbox per task
  • Notes: agent comments on stuck tasks
  • History: git commits track plan evolution

Entity: Agent Iteration

  • Context: fresh read of spec + plan + git log
  • Task: one unchecked item from plan
  • Changes: code modifications + tests
  • Verification: build + test results
  • Commit: descriptive message + plan update

Relationships

  • Project Spec → Implementation Plan (agent creates from spec)
  • Implementation Plan → Agent Iterations (one task per iteration)
  • Agent Iterations → Git Commits (each iteration commits changes)

5. API / Interface Design

The harness provides command-line interfaces:

ralph-loop.sh Commands

./ralph-loop.sh              # Build mode (default)
./ralph-loop.sh plan         # Planning mode
./ralph-loop.sh --max 20     # Limit iterations
./ralph-loop.sh --agent claude  # Specify agent

Template Files

  • PROJECT-SPEC.md: Fill with project details
  • AGENT.md: Copy from AGENT-INSTRUCTIONS.md
  • IMPLEMENTATION_PLAN.md: Generated by agent

Output Signals

Agents output special tags that the loop monitors:

  • <promise>PLANNED</promise>: Plan created
  • <promise>DONE</promise>: All tasks complete
  • <promise>STUCK</promise>: Needs human help
  • <promise>ERROR</promise>: Unrecoverable error

6. Architecture Decisions

Constraints

  • MUST: Use fresh agent contexts each iteration
  • MUST: One task per agent iteration
  • MUST: Mandatory build/test verification
  • MUST NOT: Allow context compaction or memory accumulation
  • PREFER: Git as the coordination mechanism
  • PREFER: Simple bash orchestration over complex frameworks

Dependencies

  • Git (version control)
  • AI agent CLI (claude, codex, etc.)
  • Shell environment (bash)
  • Project-specific build tools (npm, etc.)

Known Challenges

  • Context window limitations of AI agents
  • Maintaining coherence across iterations
  • Handling agent failures or stuck states
  • Balancing specificity vs flexibility in templates

7. Phasing (Optional)

The harness itself is complete in one phase, but projects using it should phase their work.

Phase 1: Foundation

  • Copy templates into project
  • Fill PROJECT-SPEC.md
  • Run planning mode to create IMPLEMENTATION_PLAN.md

Phase 2: Execution

  • Run build iterations until completion
  • Monitor for stuck/error signals
  • Intervene as needed

Phase 3: Refinement

  • Review final codebase
  • Update templates based on lessons learned
  • Document improvements for future use

8. Reference Materials

External docs

  • Geoffrey Huntley's Ralph Wiggum approach
  • Nate Jones task decomposition method
  • Ezward's sequential PRD style
  • OpenClaw sessions_spawn documentation

Existing code to learn from

  • ralph-loop.sh: Clean bash scripting with error handling
  • Templates: Structured markdown with clear sections
  • Examples: Real-world project specifications

Anti-patterns

  • Don't try to pass context between iterations
  • Don't let agents work on multiple tasks simultaneously
  • Don't skip build/test verification
  • Don't use complex orchestration when bash loop suffices
  • Don't make templates too rigid — they should be adapted per project

All Template Files and Their Roles

AGENT-INSTRUCTIONS.md

Role: System prompt template for the AI agent. Defines the senior engineer role, core workflow loop, strict rules, and output signals. Agents read this each iteration to understand their behavior.

Key Sections:

  • Role definition and capabilities
  • Core loop: orient → plan/pick → implement → verify → commit → exit
  • Rules: one task per iteration, mandatory testing, no over-engineering
  • Output signals: tags for loop control
  • Context management: fresh starts with git as memory

PROJECT-SPEC.md

Role: Comprehensive project definition template. The single source of truth that agents read every iteration. Captures all requirements, constraints, and context needed for autonomous work.

Key Sections:

  • Project overview (what, why, success criteria)
  • Technical foundation (stack, structure, commands)
  • Detailed requirements (functional + non-functional)
  • Data models and API design
  • Architecture decisions and constraints
  • Phasing and reference materials

ralph-loop.sh

Role: Bash script implementing the Ralph Wiggum Loop mechanism. Orchestrates agent iterations, monitors completion signals, handles errors, and maintains logs.

Key Features:

  • Fresh agent spawning each iteration
  • Planning mode vs build mode
  • Signal monitoring ( tags)
  • Configurable agents and iteration limits
  • Comprehensive logging

EXAMPLES.md

Role: Worked examples, comparisons of approaches, and best practices. Shows how to write good specs, compares different methodologies, and provides integration examples.

Key Content:

  • Comparison of Ezward/Ralph/Nate approaches
  • Complete FinPlan project spec example
  • Best practices for spec writing
  • OpenClaw integration examples

The Ralph Wiggum Loop Mechanism

The Ralph Wiggum Loop is named after the Simpsons character known for forgetting everything immediately, forcing fresh starts. This is the core innovation:

How It Works

  1. Fresh Context Each Time: Every iteration spawns a completely new agent process with no accumulated context from previous runs.

  2. Read-Only Memory: Agents rely on:

    • PROJECT-SPEC.md (static requirements)
    • IMPLEMENTATION_PLAN.md (current task status)
    • Git log (recent changes)
    • Codebase state
    • Test results
  3. One Task Per Iteration: Agents pick exactly one unchecked task, implement it completely, verify with build/tests, commit, and exit.

  4. Signal-Based Control: Agents output tags that the bash loop monitors to determine next action.

  5. Git as Coordination: Each iteration's changes are committed, creating an audit trail and allowing the next agent to see what was done.

Benefits

  • Prevents context window overflow
  • Eliminates stale reasoning problems
  • Enables indefinite project scaling
  • Provides clear intervention points
  • Maintains code quality through iteration

Flow Diagram

Start Loop
├── Read PROJECT-SPEC.md
├── Run Agent with Fresh Context
├── Agent: Orient (read plan, git log)
├── Agent: Pick ONE Task
├── Agent: Implement + Verify
├── Agent: Commit + Mark Done
├── Check Output Signals
├── If DONE: Exit Success
├── If STUCK/ERROR: Exit with Warning
└── Else: Loop Again

How to Use for Autonomous Coding Workflows

Quick Start

  1. Copy templates into your project root
  2. Fill out PROJECT-SPEC.md with complete project details
  3. Run ./ralph-loop.sh plan to generate IMPLEMENTATION_PLAN.md
  4. Run ./ralph-loop.sh to start autonomous building
  5. Monitor progress; intervene if agent gets stuck

Detailed Workflow

  1. Preparation:

    • Choose project directory
    • Copy all 4 template files
    • Customize PROJECT-SPEC.md with your requirements
    • Ensure build/test commands work
  2. Planning Phase:

    • Run ./ralph-loop.sh plan
    • Agent reads spec and creates task decomposition
    • Review IMPLEMENTATION_PLAN.md for completeness
  3. Build Iterations:

    • Run ./ralph-loop.sh --max 50 (or your preferred limit)
    • Each iteration: fresh agent → one task → verify → commit
    • Loop continues until DONE or max iterations
  4. Monitoring:

    • Check .ralph-logs/ for iteration details
    • Look for STUCK/ERROR signals requiring intervention
    • Review git log for progress
  5. Intervention:

    • If stuck: update IMPLEMENTATION_PLAN.md with notes
    • If error: fix the issue and restart loop
    • If plan needs changes: edit and restart

Configuration Options

  • --max N: Limit iterations (default 50)
  • --agent claude|codex: Choose AI agent
  • plan mode: Just create implementation plan

Examples and Use Cases

Personal Finance App (FinPlan)

Complete example in EXAMPLES.md showing:

  • Privacy-first local finance dashboard
  • Transaction import, categorization, projections
  • Monte Carlo retirement simulations
  • Tech stack: TypeScript, Express, SQLite, vanilla JS
  • 15+ features decomposed into phases

Key Patterns from Examples

  • Be Specific: Acceptance criteria like "Parse QFX files and extract: date, amount, payee, memo, type"
  • Define Tech Stack: Don't let agents choose — specify "TypeScript, Express.js, SQLite"
  • Include Data Models: Explicit entity definitions with constraints
  • Phase Large Projects: Independent deployable phases
  • Anti-Patterns: "Don't use localStorage — SQLite is source of truth"

Use Cases

  • Complex Web Apps: Multi-feature applications with databases
  • Libraries/Frameworks: API design and implementation
  • Data Processing: ETL pipelines, analysis tools
  • CLI Tools: Command-line utilities with multiple commands
  • Prototypes to Production: Start with working prototype, iterate to full product

Integration with OpenClaw sessions_spawn

OpenClaw provides sessions_spawn for agent orchestration, offering an alternative to the bash loop.

Basic Usage

# Planning phase
sessions_spawn --task "Read PROJECT-SPEC.md. Decompose into tasks. Write IMPLEMENTATION_PLAN.md." --model opus

# Build iterations
sessions_spawn --task "Read AGENT.md. Follow core loop: pick one task, implement, test, commit." --model sonnet

Advanced Integration

  • Parallel Tasks: Spawn multiple agents for independent tasks
  • Different Models: Use opus for planning, sonnet for coding
  • Cron Scheduling: Automate iterations with cron jobs
  • Channel Output: Direct results to specific channels

Benefits Over Bash Loop

  • Model selection per task type
  • Parallel execution for independent work
  • Integration with OpenClaw's session management
  • Richer output formatting and notifications

When to Use Each

  • Ralph Loop: Simple sequential projects, bash environments
  • OpenClaw: Complex projects, parallel work, advanced features

Best Practices for Agent-Driven Development

Writing Project Specs

  1. Be Exhaustively Specific: Include exact acceptance criteria, not vague requirements
  2. Define Everything: Tech stack, directory structure, build commands, coding standards
  3. Provide Examples: Sample data, API responses, UI mockups
  4. Phase Appropriately: Break large projects into independent phases
  5. Document Constraints: What MUST/MUST NOT do, plus preferences
  6. Include Anti-Patterns: Lessons from previous attempts

Agent Instructions

  1. Role Definition: Clear capabilities and limitations
  2. Strict Rules: One task per iteration, mandatory testing, no refactoring unrelated code
  3. Clear Signals: Use tags for loop control
  4. Context Boundaries: Fresh start each time, rely on files/git

Loop Management

  1. Monitor Logs: Check .ralph-logs/ for issues
  2. Set Reasonable Limits: --max 20-50 iterations depending on project size
  3. Plan Reviews: Always review IMPLEMENTATION_PLAN.md after planning phase
  4. Intervention Ready: Be prepared to help when agents get stuck

Quality Assurance

  1. Test Everything: Unit, integration, end-to-end tests
  2. Build Verification: Every iteration must pass build
  3. Code Standards: Lint, format, document consistently
  4. Manual Reviews: Spot-check critical functionality

Scaling Up

  1. Phase Work: Complete foundations before features
  2. Parallel Execution: Use OpenClaw for independent tasks
  3. Iterative Refinement: Start with working prototype, enhance gradually
  4. Documentation Updates: Improve templates based on lessons learned

Common Pitfalls

  • Vague Specs: Leads to agent confusion and poor decomposition
  • Missing Build/Test: Code quality suffers without verification
  • Context Sharing: Don't try to pass state between iterations
  • Over-Parallelization: Dependencies must be respected
  • Ignoring Signals: STUCK/ERROR states need attention

This system transforms AI coding assistants from helpful sidekicks into autonomous development partners capable of delivering complete, tested software projects.