467 lines
18 KiB
Markdown
467 lines
18 KiB
Markdown
# Project Specification: Agent Harness System
|
|
|
|
## 1. Project Overview
|
|
|
|
### What are we building?
|
|
The Agent Harness System is a collection of templates, scripts, and best practices for running autonomous AI-powered coding agents on complex software projects. It provides a structured framework to decompose large projects into manageable tasks, execute them iteratively with fresh agent contexts, and maintain high-quality code through mandatory testing and verification.
|
|
|
|
### Why does it matter?
|
|
Traditional AI coding assistants struggle with large, multi-step projects due to context window limitations and the need for iterative refinement. The Agent Harness addresses this by providing a "Ralph Wiggum Loop" mechanism that spawns fresh agents for each task iteration, preventing context drift while maintaining project coherence through structured documentation and git-based memory.
|
|
|
|
### Success criteria
|
|
- [ ] Agents can autonomously decompose complex project specs into testable tasks
|
|
- [ ] Fresh agent iterations prevent context overflow and stale reasoning
|
|
- [ ] Mandatory build/test cycles ensure code quality
|
|
- [ ] Git history serves as reliable inter-iteration memory
|
|
- [ ] System works with multiple AI agents (Claude, Codex, etc.)
|
|
- [ ] Clear signals for completion, stuck states, and errors
|
|
- [ ] Comprehensive documentation enables easy adoption
|
|
|
|
---
|
|
|
|
## 2. Technical Foundation
|
|
|
|
### Tech stack
|
|
- **Language:** Bash (for the loop script), Markdown (for templates)
|
|
- **Tools:** Git, shell commands, AI agent CLIs (claude, codex)
|
|
- **Build system:** N/A (templates for various project types)
|
|
- **Test framework:** Project-specific (agents run their own tests)
|
|
- **Package manager:** N/A
|
|
|
|
### Project structure
|
|
```
|
|
docs/agent-harness/
|
|
├── README.md # Quick overview and file purposes
|
|
├── AGENT-INSTRUCTIONS.md # Template for agent system prompts
|
|
├── PROJECT-SPEC.md # Template for project specifications
|
|
├── ralph-loop.sh # The loop execution script
|
|
└── EXAMPLES.md # Worked examples and best practices
|
|
```
|
|
|
|
### Build & test commands
|
|
The harness itself doesn't have build/test commands, but agents using it must define them in their PROJECT-SPEC.md.
|
|
|
|
### Coding standards
|
|
- Markdown files use consistent formatting with headers, lists, code blocks
|
|
- Bash scripts use set -euo pipefail for error handling
|
|
- Templates include clear placeholders and examples
|
|
- Documentation focuses on actionable, specific guidance
|
|
|
|
---
|
|
|
|
## 3. Requirements
|
|
|
|
### Functional Requirements
|
|
|
|
#### FR-001: Project Specification Template
|
|
**Description:** A comprehensive template that captures all necessary project details for autonomous agent work.
|
|
**Acceptance criteria:**
|
|
- [ ] Covers project overview, technical foundation, requirements, data models
|
|
- [ ] Includes phasing for large projects
|
|
- [ ] Provides reference materials and anti-patterns
|
|
- [ ] Enables agents to work without human intervention
|
|
|
|
#### FR-002: Agent Instructions Template
|
|
**Description:** System prompt template that defines agent behavior, the core loop, and rules.
|
|
**Acceptance criteria:**
|
|
- [ ] Defines senior engineer role with full codebase access
|
|
- [ ] Specifies exact sequence: orient → plan → pick task → implement → verify → commit → exit
|
|
- [ ] Includes output signals for loop control (<promise> tags)
|
|
- [ ] Enforces one-task-per-iteration rule
|
|
|
|
#### FR-003: Ralph Wiggum Loop Script
|
|
**Description:** Bash script that orchestrates agent iterations with fresh contexts.
|
|
**Acceptance criteria:**
|
|
- [ ] Spawns fresh agent processes each iteration
|
|
- [ ] Supports planning mode and build mode
|
|
- [ ] Monitors output signals for completion/stuck/error states
|
|
- [ ] Logs all iterations for debugging
|
|
- [ ] Configurable max iterations and agent type
|
|
|
|
#### FR-004: Implementation Plan Management
|
|
**Description:** Dynamic task decomposition and tracking system.
|
|
**Acceptance criteria:**
|
|
- [ ] Agents create IMPLEMENTATION_PLAN.md from project spec
|
|
- [ ] Tasks ordered by dependency with checkboxes
|
|
- [ ] Plan updated after each completed task
|
|
- [ ] Git commits preserve plan history
|
|
|
|
#### FR-005: Quality Assurance Integration
|
|
**Description:** Mandatory build and test verification in each iteration.
|
|
**Acceptance criteria:**
|
|
- [ ] Agents run project-specific build commands
|
|
- [ ] All tests must pass before committing
|
|
- [ ] Build failures prevent progression
|
|
- [ ] Linting enforced if configured
|
|
|
|
### Non-Functional Requirements
|
|
|
|
#### NFR-001: Simplicity
|
|
- [ ] No complex dependencies or frameworks
|
|
- [ ] Works with standard shell and git
|
|
- [ ] Easy to copy templates into any project
|
|
- [ ] Minimal setup required
|
|
|
|
#### NFR-002: Reliability
|
|
- [ ] Fresh contexts prevent reasoning drift
|
|
- [ ] Git history provides audit trail
|
|
- [ ] Clear error signals for human intervention
|
|
- [ ] Handles agent failures gracefully
|
|
|
|
#### NFR-003: Flexibility
|
|
- [ ] Supports multiple AI agents (Claude, Codex, etc.)
|
|
- [ ] Works with various project types and tech stacks
|
|
- [ ] Configurable iteration limits and modes
|
|
- [ ] Extensible for custom workflows
|
|
|
|
---
|
|
|
|
## 4. Data Model
|
|
|
|
The Agent Harness is documentation-focused, not data-focused. The "data" is the project files themselves.
|
|
|
|
### Entities
|
|
|
|
Entity: Project Spec
|
|
- Overview: what/why/success criteria
|
|
- Technical foundation: stack, structure, commands
|
|
- Requirements: functional/non-functional
|
|
- Data model: project-specific entities
|
|
- Architecture: constraints, decisions
|
|
- Phasing: optional breakdown
|
|
- References: docs, examples, anti-patterns
|
|
|
|
Entity: Implementation Plan
|
|
- Tasks: discrete, testable, dependency-ordered
|
|
- Status: checkbox per task
|
|
- Notes: agent comments on stuck tasks
|
|
- History: git commits track plan evolution
|
|
|
|
Entity: Agent Iteration
|
|
- Context: fresh read of spec + plan + git log
|
|
- Task: one unchecked item from plan
|
|
- Changes: code modifications + tests
|
|
- Verification: build + test results
|
|
- Commit: descriptive message + plan update
|
|
|
|
### Relationships
|
|
- Project Spec → Implementation Plan (agent creates from spec)
|
|
- Implementation Plan → Agent Iterations (one task per iteration)
|
|
- Agent Iterations → Git Commits (each iteration commits changes)
|
|
|
|
---
|
|
|
|
## 5. API / Interface Design
|
|
|
|
The harness provides command-line interfaces:
|
|
|
|
### ralph-loop.sh Commands
|
|
|
|
```bash
|
|
./ralph-loop.sh # Build mode (default)
|
|
./ralph-loop.sh plan # Planning mode
|
|
./ralph-loop.sh --max 20 # Limit iterations
|
|
./ralph-loop.sh --agent claude # Specify agent
|
|
```
|
|
|
|
### Template Files
|
|
- PROJECT-SPEC.md: Fill with project details
|
|
- AGENT.md: Copy from AGENT-INSTRUCTIONS.md
|
|
- IMPLEMENTATION_PLAN.md: Generated by agent
|
|
|
|
### Output Signals
|
|
Agents output special tags that the loop monitors:
|
|
- `<promise>PLANNED</promise>`: Plan created
|
|
- `<promise>DONE</promise>`: All tasks complete
|
|
- `<promise>STUCK</promise>`: Needs human help
|
|
- `<promise>ERROR</promise>`: Unrecoverable error
|
|
|
|
---
|
|
|
|
## 6. Architecture Decisions
|
|
|
|
### Constraints
|
|
- MUST: Use fresh agent contexts each iteration
|
|
- MUST: One task per agent iteration
|
|
- MUST: Mandatory build/test verification
|
|
- MUST NOT: Allow context compaction or memory accumulation
|
|
- PREFER: Git as the coordination mechanism
|
|
- PREFER: Simple bash orchestration over complex frameworks
|
|
|
|
### Dependencies
|
|
- Git (version control)
|
|
- AI agent CLI (claude, codex, etc.)
|
|
- Shell environment (bash)
|
|
- Project-specific build tools (npm, etc.)
|
|
|
|
### Known Challenges
|
|
- Context window limitations of AI agents
|
|
- Maintaining coherence across iterations
|
|
- Handling agent failures or stuck states
|
|
- Balancing specificity vs flexibility in templates
|
|
|
|
---
|
|
|
|
## 7. Phasing (Optional)
|
|
|
|
The harness itself is complete in one phase, but projects using it should phase their work.
|
|
|
|
### Phase 1: Foundation
|
|
- [ ] Copy templates into project
|
|
- [ ] Fill PROJECT-SPEC.md
|
|
- [ ] Run planning mode to create IMPLEMENTATION_PLAN.md
|
|
|
|
### Phase 2: Execution
|
|
- [ ] Run build iterations until completion
|
|
- [ ] Monitor for stuck/error signals
|
|
- [ ] Intervene as needed
|
|
|
|
### Phase 3: Refinement
|
|
- [ ] Review final codebase
|
|
- [ ] Update templates based on lessons learned
|
|
- [ ] Document improvements for future use
|
|
|
|
---
|
|
|
|
## 8. Reference Materials
|
|
|
|
### External docs
|
|
- Geoffrey Huntley's Ralph Wiggum approach
|
|
- Nate Jones task decomposition method
|
|
- Ezward's sequential PRD style
|
|
- OpenClaw sessions_spawn documentation
|
|
|
|
### Existing code to learn from
|
|
- ralph-loop.sh: Clean bash scripting with error handling
|
|
- Templates: Structured markdown with clear sections
|
|
- Examples: Real-world project specifications
|
|
|
|
### Anti-patterns
|
|
- Don't try to pass context between iterations
|
|
- Don't let agents work on multiple tasks simultaneously
|
|
- Don't skip build/test verification
|
|
- Don't use complex orchestration when bash loop suffices
|
|
- Don't make templates too rigid — they should be adapted per project
|
|
|
|
---
|
|
|
|
## All Template Files and Their Roles
|
|
|
|
### AGENT-INSTRUCTIONS.md
|
|
**Role:** System prompt template for the AI agent. Defines the senior engineer role, core workflow loop, strict rules, and output signals. Agents read this each iteration to understand their behavior.
|
|
|
|
**Key Sections:**
|
|
- Role definition and capabilities
|
|
- Core loop: orient → plan/pick → implement → verify → commit → exit
|
|
- Rules: one task per iteration, mandatory testing, no over-engineering
|
|
- Output signals: <promise> tags for loop control
|
|
- Context management: fresh starts with git as memory
|
|
|
|
### PROJECT-SPEC.md
|
|
**Role:** Comprehensive project definition template. The single source of truth that agents read every iteration. Captures all requirements, constraints, and context needed for autonomous work.
|
|
|
|
**Key Sections:**
|
|
- Project overview (what, why, success criteria)
|
|
- Technical foundation (stack, structure, commands)
|
|
- Detailed requirements (functional + non-functional)
|
|
- Data models and API design
|
|
- Architecture decisions and constraints
|
|
- Phasing and reference materials
|
|
|
|
### ralph-loop.sh
|
|
**Role:** Bash script implementing the Ralph Wiggum Loop mechanism. Orchestrates agent iterations, monitors completion signals, handles errors, and maintains logs.
|
|
|
|
**Key Features:**
|
|
- Fresh agent spawning each iteration
|
|
- Planning mode vs build mode
|
|
- Signal monitoring (<promise> tags)
|
|
- Configurable agents and iteration limits
|
|
- Comprehensive logging
|
|
|
|
### EXAMPLES.md
|
|
**Role:** Worked examples, comparisons of approaches, and best practices. Shows how to write good specs, compares different methodologies, and provides integration examples.
|
|
|
|
**Key Content:**
|
|
- Comparison of Ezward/Ralph/Nate approaches
|
|
- Complete FinPlan project spec example
|
|
- Best practices for spec writing
|
|
- OpenClaw integration examples
|
|
|
|
## The Ralph Wiggum Loop Mechanism
|
|
|
|
The Ralph Wiggum Loop is named after the Simpsons character known for forgetting everything immediately, forcing fresh starts. This is the core innovation:
|
|
|
|
### How It Works
|
|
1. **Fresh Context Each Time:** Every iteration spawns a completely new agent process with no accumulated context from previous runs.
|
|
|
|
2. **Read-Only Memory:** Agents rely on:
|
|
- PROJECT-SPEC.md (static requirements)
|
|
- IMPLEMENTATION_PLAN.md (current task status)
|
|
- Git log (recent changes)
|
|
- Codebase state
|
|
- Test results
|
|
|
|
3. **One Task Per Iteration:** Agents pick exactly one unchecked task, implement it completely, verify with build/tests, commit, and exit.
|
|
|
|
4. **Signal-Based Control:** Agents output <promise> tags that the bash loop monitors to determine next action.
|
|
|
|
5. **Git as Coordination:** Each iteration's changes are committed, creating an audit trail and allowing the next agent to see what was done.
|
|
|
|
### Benefits
|
|
- Prevents context window overflow
|
|
- Eliminates stale reasoning problems
|
|
- Enables indefinite project scaling
|
|
- Provides clear intervention points
|
|
- Maintains code quality through iteration
|
|
|
|
### Flow Diagram
|
|
```
|
|
Start Loop
|
|
├── Read PROJECT-SPEC.md
|
|
├── Run Agent with Fresh Context
|
|
├── Agent: Orient (read plan, git log)
|
|
├── Agent: Pick ONE Task
|
|
├── Agent: Implement + Verify
|
|
├── Agent: Commit + Mark Done
|
|
├── Check Output Signals
|
|
├── If DONE: Exit Success
|
|
├── If STUCK/ERROR: Exit with Warning
|
|
└── Else: Loop Again
|
|
```
|
|
|
|
## How to Use for Autonomous Coding Workflows
|
|
|
|
### Quick Start
|
|
1. Copy templates into your project root
|
|
2. Fill out PROJECT-SPEC.md with complete project details
|
|
3. Run `./ralph-loop.sh plan` to generate IMPLEMENTATION_PLAN.md
|
|
4. Run `./ralph-loop.sh` to start autonomous building
|
|
5. Monitor progress; intervene if agent gets stuck
|
|
|
|
### Detailed Workflow
|
|
1. **Preparation:**
|
|
- Choose project directory
|
|
- Copy all 4 template files
|
|
- Customize PROJECT-SPEC.md with your requirements
|
|
- Ensure build/test commands work
|
|
|
|
2. **Planning Phase:**
|
|
- Run `./ralph-loop.sh plan`
|
|
- Agent reads spec and creates task decomposition
|
|
- Review IMPLEMENTATION_PLAN.md for completeness
|
|
|
|
3. **Build Iterations:**
|
|
- Run `./ralph-loop.sh --max 50` (or your preferred limit)
|
|
- Each iteration: fresh agent → one task → verify → commit
|
|
- Loop continues until DONE or max iterations
|
|
|
|
4. **Monitoring:**
|
|
- Check `.ralph-logs/` for iteration details
|
|
- Look for STUCK/ERROR signals requiring intervention
|
|
- Review git log for progress
|
|
|
|
5. **Intervention:**
|
|
- If stuck: update IMPLEMENTATION_PLAN.md with notes
|
|
- If error: fix the issue and restart loop
|
|
- If plan needs changes: edit and restart
|
|
|
|
### Configuration Options
|
|
- `--max N`: Limit iterations (default 50)
|
|
- `--agent claude|codex`: Choose AI agent
|
|
- `plan` mode: Just create implementation plan
|
|
|
|
## Examples and Use Cases
|
|
|
|
### Personal Finance App (FinPlan)
|
|
Complete example in EXAMPLES.md showing:
|
|
- Privacy-first local finance dashboard
|
|
- Transaction import, categorization, projections
|
|
- Monte Carlo retirement simulations
|
|
- Tech stack: TypeScript, Express, SQLite, vanilla JS
|
|
- 15+ features decomposed into phases
|
|
|
|
### Key Patterns from Examples
|
|
- **Be Specific:** Acceptance criteria like "Parse QFX files and extract: date, amount, payee, memo, type"
|
|
- **Define Tech Stack:** Don't let agents choose — specify "TypeScript, Express.js, SQLite"
|
|
- **Include Data Models:** Explicit entity definitions with constraints
|
|
- **Phase Large Projects:** Independent deployable phases
|
|
- **Anti-Patterns:** "Don't use localStorage — SQLite is source of truth"
|
|
|
|
### Use Cases
|
|
- **Complex Web Apps:** Multi-feature applications with databases
|
|
- **Libraries/Frameworks:** API design and implementation
|
|
- **Data Processing:** ETL pipelines, analysis tools
|
|
- **CLI Tools:** Command-line utilities with multiple commands
|
|
- **Prototypes to Production:** Start with working prototype, iterate to full product
|
|
|
|
## Integration with OpenClaw sessions_spawn
|
|
|
|
OpenClaw provides `sessions_spawn` for agent orchestration, offering an alternative to the bash loop.
|
|
|
|
### Basic Usage
|
|
```bash
|
|
# Planning phase
|
|
sessions_spawn --task "Read PROJECT-SPEC.md. Decompose into tasks. Write IMPLEMENTATION_PLAN.md." --model opus
|
|
|
|
# Build iterations
|
|
sessions_spawn --task "Read AGENT.md. Follow core loop: pick one task, implement, test, commit." --model sonnet
|
|
```
|
|
|
|
### Advanced Integration
|
|
- **Parallel Tasks:** Spawn multiple agents for independent tasks
|
|
- **Different Models:** Use opus for planning, sonnet for coding
|
|
- **Cron Scheduling:** Automate iterations with cron jobs
|
|
- **Channel Output:** Direct results to specific channels
|
|
|
|
### Benefits Over Bash Loop
|
|
- Model selection per task type
|
|
- Parallel execution for independent work
|
|
- Integration with OpenClaw's session management
|
|
- Richer output formatting and notifications
|
|
|
|
### When to Use Each
|
|
- **Ralph Loop:** Simple sequential projects, bash environments
|
|
- **OpenClaw:** Complex projects, parallel work, advanced features
|
|
|
|
## Best Practices for Agent-Driven Development
|
|
|
|
### Writing Project Specs
|
|
1. **Be Exhaustively Specific:** Include exact acceptance criteria, not vague requirements
|
|
2. **Define Everything:** Tech stack, directory structure, build commands, coding standards
|
|
3. **Provide Examples:** Sample data, API responses, UI mockups
|
|
4. **Phase Appropriately:** Break large projects into independent phases
|
|
5. **Document Constraints:** What MUST/MUST NOT do, plus preferences
|
|
6. **Include Anti-Patterns:** Lessons from previous attempts
|
|
|
|
### Agent Instructions
|
|
1. **Role Definition:** Clear capabilities and limitations
|
|
2. **Strict Rules:** One task per iteration, mandatory testing, no refactoring unrelated code
|
|
3. **Clear Signals:** Use <promise> tags for loop control
|
|
4. **Context Boundaries:** Fresh start each time, rely on files/git
|
|
|
|
### Loop Management
|
|
1. **Monitor Logs:** Check .ralph-logs/ for issues
|
|
2. **Set Reasonable Limits:** --max 20-50 iterations depending on project size
|
|
3. **Plan Reviews:** Always review IMPLEMENTATION_PLAN.md after planning phase
|
|
4. **Intervention Ready:** Be prepared to help when agents get stuck
|
|
|
|
### Quality Assurance
|
|
1. **Test Everything:** Unit, integration, end-to-end tests
|
|
2. **Build Verification:** Every iteration must pass build
|
|
3. **Code Standards:** Lint, format, document consistently
|
|
4. **Manual Reviews:** Spot-check critical functionality
|
|
|
|
### Scaling Up
|
|
1. **Phase Work:** Complete foundations before features
|
|
2. **Parallel Execution:** Use OpenClaw for independent tasks
|
|
3. **Iterative Refinement:** Start with working prototype, enhance gradually
|
|
4. **Documentation Updates:** Improve templates based on lessons learned
|
|
|
|
### Common Pitfalls
|
|
- **Vague Specs:** Leads to agent confusion and poor decomposition
|
|
- **Missing Build/Test:** Code quality suffers without verification
|
|
- **Context Sharing:** Don't try to pass state between iterations
|
|
- **Over-Parallelization:** Dependencies must be respected
|
|
- **Ignoring Signals:** STUCK/ERROR states need attention
|
|
|
|
This system transforms AI coding assistants from helpful sidekicks into autonomous development partners capable of delivering complete, tested software projects. |