agent-harness/archive/Agent-Harness-Project-spec-...

# Project Specification: Agent Harness System

## 1. Project Overview

### What are we building?
The Agent Harness System is a collection of templates, scripts, and best practices for running autonomous AI-powered coding agents on complex software projects. It provides a structured framework to decompose large projects into manageable tasks, execute them iteratively with fresh agent contexts, and maintain high-quality code through mandatory testing and verification.

### Why does it matter?
Traditional AI coding assistants struggle with large, multi-step projects due to context window limitations and the need for iterative refinement. The Agent Harness addresses this by providing a "Ralph Wiggum Loop" mechanism that spawns fresh agents for each task iteration, preventing context drift while maintaining project coherence through structured documentation and git-based memory.

### Success criteria
- [ ] Agents can autonomously decompose complex project specs into testable tasks
- [ ] Fresh agent iterations prevent context overflow and stale reasoning
- [ ] Mandatory build/test cycles ensure code quality
- [ ] Git history serves as reliable inter-iteration memory
- [ ] System works with multiple AI agents (Claude, Codex, etc.)
- [ ] Clear signals for completion, stuck states, and errors
- [ ] Comprehensive documentation enables easy adoption

---

## 2. Technical Foundation

### Tech stack
- **Language:** Bash (for the loop script), Markdown (for templates)
- **Tools:** Git, shell commands, AI agent CLIs (claude, codex)
- **Build system:** N/A (templates for various project types)
- **Test framework:** Project-specific (agents run their own tests)
- **Package manager:** N/A

### Project structure
```
docs/agent-harness/
├── README.md              # Quick overview and file purposes
├── AGENT-INSTRUCTIONS.md  # Template for agent system prompts
├── PROJECT-SPEC.md        # Template for project specifications
├── ralph-loop.sh          # The loop execution script
└── EXAMPLES.md            # Worked examples and best practices
```

### Build & test commands
The harness itself doesn't have build/test commands, but agents using it must define them in their PROJECT-SPEC.md.

### Coding standards
- Markdown files use consistent formatting with headers, lists, code blocks
- Bash scripts use set -euo pipefail for error handling
- Templates include clear placeholders and examples
- Documentation focuses on actionable, specific guidance

---

## 3. Requirements

### Functional Requirements

#### FR-001: Project Specification Template
**Description:** A comprehensive template that captures all necessary project details for autonomous agent work.
**Acceptance criteria:**
- [ ] Covers project overview, technical foundation, requirements, data models
- [ ] Includes phasing for large projects
- [ ] Provides reference materials and anti-patterns
- [ ] Enables agents to work without human intervention

#### FR-002: Agent Instructions Template
**Description:** System prompt template that defines agent behavior, the core loop, and rules.
**Acceptance criteria:**
- [ ] Defines senior engineer role with full codebase access
- [ ] Specifies exact sequence: orient → plan → pick task → implement → verify → commit → exit
- [ ] Includes output signals for loop control (<promise> tags)
- [ ] Enforces one-task-per-iteration rule

#### FR-003: Ralph Wiggum Loop Script
**Description:** Bash script that orchestrates agent iterations with fresh contexts.
**Acceptance criteria:**
- [ ] Spawns fresh agent processes each iteration
- [ ] Supports planning mode and build mode
- [ ] Monitors output signals for completion/stuck/error states
- [ ] Logs all iterations for debugging
- [ ] Configurable max iterations and agent type

#### FR-004: Implementation Plan Management
**Description:** Dynamic task decomposition and tracking system.
**Acceptance criteria:**
- [ ] Agents create IMPLEMENTATION_PLAN.md from project spec
- [ ] Tasks ordered by dependency with checkboxes
- [ ] Plan updated after each completed task
- [ ] Git commits preserve plan history

#### FR-005: Quality Assurance Integration
**Description:** Mandatory build and test verification in each iteration.
**Acceptance criteria:**
- [ ] Agents run project-specific build commands
- [ ] All tests must pass before committing
- [ ] Build failures prevent progression
- [ ] Linting enforced if configured

### Non-Functional Requirements

#### NFR-001: Simplicity
- [ ] No complex dependencies or frameworks
- [ ] Works with standard shell and git
- [ ] Easy to copy templates into any project
- [ ] Minimal setup required

#### NFR-002: Reliability
- [ ] Fresh contexts prevent reasoning drift
- [ ] Git history provides audit trail
- [ ] Clear error signals for human intervention
- [ ] Handles agent failures gracefully

#### NFR-003: Flexibility
- [ ] Supports multiple AI agents (Claude, Codex, etc.)
- [ ] Works with various project types and tech stacks
- [ ] Configurable iteration limits and modes
- [ ] Extensible for custom workflows

---

## 4. Data Model

The Agent Harness is documentation-focused, not data-focused. The "data" is the project files themselves.

### Entities

Entity: Project Spec
- Overview: what/why/success criteria
- Technical foundation: stack, structure, commands
- Requirements: functional/non-functional
- Data model: project-specific entities
- Architecture: constraints, decisions
- Phasing: optional breakdown
- References: docs, examples, anti-patterns

Entity: Implementation Plan
- Tasks: discrete, testable, dependency-ordered
- Status: checkbox per task
- Notes: agent comments on stuck tasks
- History: git commits track plan evolution

Entity: Agent Iteration
- Context: fresh read of spec + plan + git log
- Task: one unchecked item from plan
- Changes: code modifications + tests
- Verification: build + test results
- Commit: descriptive message + plan update

### Relationships
- Project Spec → Implementation Plan (agent creates from spec)
- Implementation Plan → Agent Iterations (one task per iteration)
- Agent Iterations → Git Commits (each iteration commits changes)

---

## 5. API / Interface Design

The harness provides command-line interfaces:

### ralph-loop.sh Commands

```bash
./ralph-loop.sh              # Build mode (default)
./ralph-loop.sh plan         # Planning mode
./ralph-loop.sh --max 20     # Limit iterations
./ralph-loop.sh --agent claude  # Specify agent
```

### Template Files
- PROJECT-SPEC.md: Fill with project details
- AGENT.md: Copy from AGENT-INSTRUCTIONS.md
- IMPLEMENTATION_PLAN.md: Generated by agent

### Output Signals
Agents output special tags that the loop monitors:
- `<promise>PLANNED</promise>`: Plan created
- `<promise>DONE</promise>`: All tasks complete
- `<promise>STUCK</promise>`: Needs human help
- `<promise>ERROR</promise>`: Unrecoverable error

---

## 6. Architecture Decisions

### Constraints
- MUST: Use fresh agent contexts each iteration
- MUST: One task per agent iteration
- MUST: Mandatory build/test verification
- MUST NOT: Allow context compaction or memory accumulation
- PREFER: Git as the coordination mechanism
- PREFER: Simple bash orchestration over complex frameworks

### Dependencies
- Git (version control)
- AI agent CLI (claude, codex, etc.)
- Shell environment (bash)
- Project-specific build tools (npm, etc.)

### Known Challenges
- Context window limitations of AI agents
- Maintaining coherence across iterations
- Handling agent failures or stuck states
- Balancing specificity vs flexibility in templates

---

## 7. Phasing (Optional)

The harness itself is complete in one phase, but projects using it should phase their work.

### Phase 1: Foundation
- [ ] Copy templates into project
- [ ] Fill PROJECT-SPEC.md
- [ ] Run planning mode to create IMPLEMENTATION_PLAN.md

### Phase 2: Execution
- [ ] Run build iterations until completion
- [ ] Monitor for stuck/error signals
- [ ] Intervene as needed

### Phase 3: Refinement
- [ ] Review final codebase
- [ ] Update templates based on lessons learned
- [ ] Document improvements for future use

---

## 8. Reference Materials

### External docs
- Geoffrey Huntley's Ralph Wiggum approach
- Nate Jones task decomposition method
- Ezward's sequential PRD style
- OpenClaw sessions_spawn documentation

### Existing code to learn from
- ralph-loop.sh: Clean bash scripting with error handling
- Templates: Structured markdown with clear sections
- Examples: Real-world project specifications

### Anti-patterns
- Don't try to pass context between iterations
- Don't let agents work on multiple tasks simultaneously
- Don't skip build/test verification
- Don't use complex orchestration when bash loop suffices
- Don't make templates too rigid — they should be adapted per project

---

## All Template Files and Their Roles

### AGENT-INSTRUCTIONS.md
**Role:** System prompt template for the AI agent. Defines the senior engineer role, core workflow loop, strict rules, and output signals. Agents read this each iteration to understand their behavior.

**Key Sections:**
- Role definition and capabilities
- Core loop: orient → plan/pick → implement → verify → commit → exit
- Rules: one task per iteration, mandatory testing, no over-engineering
- Output signals: <promise> tags for loop control
- Context management: fresh starts with git as memory

### PROJECT-SPEC.md
**Role:** Comprehensive project definition template. The single source of truth that agents read every iteration. Captures all requirements, constraints, and context needed for autonomous work.

**Key Sections:**
- Project overview (what, why, success criteria)
- Technical foundation (stack, structure, commands)
- Detailed requirements (functional + non-functional)
- Data models and API design
- Architecture decisions and constraints
- Phasing and reference materials

### ralph-loop.sh
**Role:** Bash script implementing the Ralph Wiggum Loop mechanism. Orchestrates agent iterations, monitors completion signals, handles errors, and maintains logs.

**Key Features:**
- Fresh agent spawning each iteration
- Planning mode vs build mode
- Signal monitoring (<promise> tags)
- Configurable agents and iteration limits
- Comprehensive logging

### EXAMPLES.md
**Role:** Worked examples, comparisons of approaches, and best practices. Shows how to write good specs, compares different methodologies, and provides integration examples.

**Key Content:**
- Comparison of Ezward/Ralph/Nate approaches
- Complete FinPlan project spec example
- Best practices for spec writing
- OpenClaw integration examples

## The Ralph Wiggum Loop Mechanism

The Ralph Wiggum Loop is named after the Simpsons character known for forgetting everything immediately, forcing fresh starts. This is the core innovation:

### How It Works
1. **Fresh Context Each Time:** Every iteration spawns a completely new agent process with no accumulated context from previous runs.

2. **Read-Only Memory:** Agents rely on:
   - PROJECT-SPEC.md (static requirements)
   - IMPLEMENTATION_PLAN.md (current task status)
   - Git log (recent changes)
   - Codebase state
   - Test results

3. **One Task Per Iteration:** Agents pick exactly one unchecked task, implement it completely, verify with build/tests, commit, and exit.

4. **Signal-Based Control:** Agents output <promise> tags that the bash loop monitors to determine next action.

5. **Git as Coordination:** Each iteration's changes are committed, creating an audit trail and allowing the next agent to see what was done.

### Benefits
- Prevents context window overflow
- Eliminates stale reasoning problems
- Enables indefinite project scaling
- Provides clear intervention points
- Maintains code quality through iteration

### Flow Diagram
```
Start Loop
├── Read PROJECT-SPEC.md
├── Run Agent with Fresh Context
├── Agent: Orient (read plan, git log)
├── Agent: Pick ONE Task
├── Agent: Implement + Verify
├── Agent: Commit + Mark Done
├── Check Output Signals
├── If DONE: Exit Success
├── If STUCK/ERROR: Exit with Warning
└── Else: Loop Again
```

## How to Use for Autonomous Coding Workflows

### Quick Start
1. Copy templates into your project root
2. Fill out PROJECT-SPEC.md with complete project details
3. Run `./ralph-loop.sh plan` to generate IMPLEMENTATION_PLAN.md
4. Run `./ralph-loop.sh` to start autonomous building
5. Monitor progress; intervene if agent gets stuck

### Detailed Workflow
1. **Preparation:**
   - Choose project directory
   - Copy all 4 template files
   - Customize PROJECT-SPEC.md with your requirements
   - Ensure build/test commands work

2. **Planning Phase:**
   - Run `./ralph-loop.sh plan`
   - Agent reads spec and creates task decomposition
   - Review IMPLEMENTATION_PLAN.md for completeness

3. **Build Iterations:**
   - Run `./ralph-loop.sh --max 50` (or your preferred limit)
   - Each iteration: fresh agent → one task → verify → commit
   - Loop continues until DONE or max iterations

4. **Monitoring:**
   - Check `.ralph-logs/` for iteration details
   - Look for STUCK/ERROR signals requiring intervention
   - Review git log for progress

5. **Intervention:**
   - If stuck: update IMPLEMENTATION_PLAN.md with notes
   - If error: fix the issue and restart loop
   - If plan needs changes: edit and restart

### Configuration Options
- `--max N`: Limit iterations (default 50)
- `--agent claude|codex`: Choose AI agent
- `plan` mode: Just create implementation plan

## Examples and Use Cases

### Personal Finance App (FinPlan)
Complete example in EXAMPLES.md showing:
- Privacy-first local finance dashboard
- Transaction import, categorization, projections
- Monte Carlo retirement simulations
- Tech stack: TypeScript, Express, SQLite, vanilla JS
- 15+ features decomposed into phases

### Key Patterns from Examples
- **Be Specific:** Acceptance criteria like "Parse QFX files and extract: date, amount, payee, memo, type"
- **Define Tech Stack:** Don't let agents choose — specify "TypeScript, Express.js, SQLite"
- **Include Data Models:** Explicit entity definitions with constraints
- **Phase Large Projects:** Independent deployable phases
- **Anti-Patterns:** "Don't use localStorage — SQLite is source of truth"

### Use Cases
- **Complex Web Apps:** Multi-feature applications with databases
- **Libraries/Frameworks:** API design and implementation
- **Data Processing:** ETL pipelines, analysis tools
- **CLI Tools:** Command-line utilities with multiple commands
- **Prototypes to Production:** Start with working prototype, iterate to full product

## Integration with OpenClaw sessions_spawn

OpenClaw provides `sessions_spawn` for agent orchestration, offering an alternative to the bash loop.

### Basic Usage
```bash
# Planning phase
sessions_spawn --task "Read PROJECT-SPEC.md. Decompose into tasks. Write IMPLEMENTATION_PLAN.md." --model opus

# Build iterations
sessions_spawn --task "Read AGENT.md. Follow core loop: pick one task, implement, test, commit." --model sonnet
```

### Advanced Integration
- **Parallel Tasks:** Spawn multiple agents for independent tasks
- **Different Models:** Use opus for planning, sonnet for coding
- **Cron Scheduling:** Automate iterations with cron jobs
- **Channel Output:** Direct results to specific channels

### Benefits Over Bash Loop
- Model selection per task type
- Parallel execution for independent work
- Integration with OpenClaw's session management
- Richer output formatting and notifications

### When to Use Each
- **Ralph Loop:** Simple sequential projects, bash environments
- **OpenClaw:** Complex projects, parallel work, advanced features

## Best Practices for Agent-Driven Development

### Writing Project Specs
1. **Be Exhaustively Specific:** Include exact acceptance criteria, not vague requirements
2. **Define Everything:** Tech stack, directory structure, build commands, coding standards
3. **Provide Examples:** Sample data, API responses, UI mockups
4. **Phase Appropriately:** Break large projects into independent phases
5. **Document Constraints:** What MUST/MUST NOT do, plus preferences
6. **Include Anti-Patterns:** Lessons from previous attempts

### Agent Instructions
1. **Role Definition:** Clear capabilities and limitations
2. **Strict Rules:** One task per iteration, mandatory testing, no refactoring unrelated code
3. **Clear Signals:** Use <promise> tags for loop control
4. **Context Boundaries:** Fresh start each time, rely on files/git

### Loop Management
1. **Monitor Logs:** Check .ralph-logs/ for issues
2. **Set Reasonable Limits:** --max 20-50 iterations depending on project size
3. **Plan Reviews:** Always review IMPLEMENTATION_PLAN.md after planning phase
4. **Intervention Ready:** Be prepared to help when agents get stuck

### Quality Assurance
1. **Test Everything:** Unit, integration, end-to-end tests
2. **Build Verification:** Every iteration must pass build
3. **Code Standards:** Lint, format, document consistently
4. **Manual Reviews:** Spot-check critical functionality

### Scaling Up
1. **Phase Work:** Complete foundations before features
2. **Parallel Execution:** Use OpenClaw for independent tasks
3. **Iterative Refinement:** Start with working prototype, enhance gradually
4. **Documentation Updates:** Improve templates based on lessons learned

### Common Pitfalls
- **Vague Specs:** Leads to agent confusion and poor decomposition
- **Missing Build/Test:** Code quality suffers without verification
- **Context Sharing:** Don't try to pass state between iterations
- **Over-Parallelization:** Dependencies must be respected
- **Ignoring Signals:** STUCK/ERROR states need attention

This system transforms AI coding assistants from helpful sidekicks into autonomous development partners capable of delivering complete, tested software projects.