agent-harness/OPENCLAW-INTEGRATION.md

322 lines
9.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# OpenClaw Integration — Running the Harness in OpenClaw
> The ralph-loop.sh runs locally via CLI tools (claude, codex).
> OpenClaw provides a richer runtime: sub-agents, cron scheduling, cross-session
> messaging, and persistent workspace. This guide shows how to use them.
---
## Architecture: CLI Loop vs OpenClaw
### CLI Approach (ralph-loop.sh)
```
bash loop → spawn CLI agent → agent reads spec → works → exits → loop restarts
```
- Runs on your terminal
- You watch the output in real-time
- Agent uses CLI tool (claude -p, codex)
- Loop manages restarts
### OpenClaw Approach
```
main session → sessions_spawn → sub-agent reads spec → works → exits → result delivered
```
- Runs in the background
- Results delivered to your chat (Telegram, Discord, etc.)
- Sub-agent gets full workspace access
- You orchestrate via main session or cron jobs
### When to Use Which
| Scenario | Use |
|----------|-----|
| Sitting at your terminal, watching progress | ralph-loop.sh |
| Overnight autonomous work | OpenClaw sessions_spawn |
| Scheduled recurring tasks | OpenClaw cron |
| Quick one-off agent tasks | OpenClaw sessions_spawn |
| Need to monitor from your phone | OpenClaw (results come to Telegram) |
---
## Method 1: Manual Orchestration (Main Session)
You drive each iteration from your chat with your agent (Cleo, etc.):
```
You: "Run the next iteration of the agent harness on /path/to/project"
Cleo: *spawns sub-agent* → *sub-agent does one task* → *reports back*
You: "Looks good, run the next one"
Cleo: *spawns another sub-agent* → *reports back*
```
**Pros:** Full control, you review between iterations
**Cons:** Requires your attention, not autonomous
### How Your Agent Orchestrates
Your main agent (e.g., Cleo) uses `sessions_spawn` internally:
```
sessions_spawn(
task: "Read AGENT.md in /home/user/project/. Follow the core loop:
orient, pick one task from IMPLEMENTATION_PLAN.md, implement,
verify (build + test), commit, exit. Report what you completed.",
model: "sonnet",
timeoutSeconds: 600
)
```
The sub-agent:
1. Reads AGENT.md and PROJECT-SPEC.md
2. Reads IMPLEMENTATION_PLAN.md
3. Picks the first unchecked task
4. Implements and tests it
5. Commits
6. Updates the plan
7. Reports back to your main session
---
## Method 2: Cron-Based Automation
For true overnight autonomy, use OpenClaw cron jobs:
### The Iteration Job
```json
{
"name": "agent-harness-iteration",
"schedule": { "kind": "every", "everyMs": 900000 },
"sessionTarget": "isolated",
"payload": {
"kind": "agentTurn",
"message": "Read AGENT.md in /home/paulh/.openclaw/workspace/projects/my-project/. Follow the core loop: orient → pick ONE task → implement → verify → commit → update plan → exit. If all tasks are done, say DONE. If stuck, say STUCK with details.",
"model": "sonnet",
"timeoutSeconds": 600
},
"delivery": { "mode": "announce" }
}
```
This spawns a fresh agent every 15 minutes. Each one picks the next task, does it, and reports to your chat.
### The Completion Watcher
Add logic to stop the loop when done:
```json
{
"name": "agent-harness-completion-check",
"schedule": { "kind": "every", "everyMs": 3600000 },
"sessionTarget": "isolated",
"payload": {
"kind": "agentTurn",
"message": "Read IMPLEMENTATION_PLAN.md in /home/paulh/.openclaw/workspace/projects/my-project/. Count completed vs total tasks. If all tasks are checked, disable the cron job named 'agent-harness-iteration' and announce 'All tasks complete!' If less than 80% done and no progress in last 3 git commits, announce a warning.",
"model": "sonnet",
"timeoutSeconds": 120
},
"delivery": { "mode": "announce" }
}
```
### Starting and Stopping
```
You: "Start the agent harness loop for my-project"
Cleo: *creates cron job* → iterations begin every 15 min
You: "Pause the harness"
Cleo: *disables cron job*
You: "Resume"
Cleo: *re-enables cron job*
```
---
## Method 3: Shell Script Orchestration
Create a script in your workspace that uses OpenClaw's CLI:
```bash
#!/usr/bin/env bash
# openclaw-harness.sh — Run agent harness via OpenClaw sub-agents
#
# Usage:
# ./openclaw-harness.sh /path/to/project [max-iterations]
set -euo pipefail
PROJECT_DIR="${1:?Usage: $0 <project-dir> [max-iterations]}"
MAX="${2:-20}"
PLAN="$PROJECT_DIR/IMPLEMENTATION_PLAN.md"
echo "🔄 Starting OpenClaw agent harness"
echo " Project: $PROJECT_DIR"
echo " Max iterations: $MAX"
for i in $(seq 1 "$MAX"); do
echo ""
echo "━━━ Iteration $i/$MAX ━━━"
# Check if plan exists and all tasks are done
if [[ -f "$PLAN" ]] && ! grep -q '^\- \[ \]' "$PLAN" 2>/dev/null; then
echo "✅ All tasks complete!"
exit 0
fi
# Spawn sub-agent via OpenClaw
# Note: This is conceptual — actual invocation depends on OpenClaw CLI
openclaw sessions spawn \
--task "Read AGENT.md in $PROJECT_DIR. Follow the core loop: orient, pick ONE task, implement, verify, commit, exit." \
--model sonnet \
--timeout 600
echo " Iteration $i complete. Waiting 30s..."
sleep 30
done
echo "⚠️ Reached max iterations ($MAX)"
```
**Note:** The exact `openclaw` CLI syntax may vary. Check `openclaw help` for current commands. The `sessions_spawn` tool within an agent session is the most reliable method.
---
## Monitoring Sub-Agent Work
### Real-Time: Session List
```
You: "What sub-agents are running?"
Cleo: *calls sessions_list* → shows active sessions
```
### After Completion: Session History
```
You: "What did the last sub-agent do?"
Cleo: *calls sessions_history(sessionKey)* → shows full transcript
```
The transcript includes:
- Every tool call the agent made (file reads, writes, exec commands)
- The agent's reasoning
- Build/test output
- Git commits made
- Final summary
### Git Log: The Source of Truth
```bash
cd /path/to/project
git log --oneline -10
```
Each iteration should produce exactly one commit. If you see:
- **0 commits** after an iteration → agent was stuck
- **1 commit** → working correctly
- **2+ commits** → task was large or agent fixed its own mistakes
### Plan File: Progress at a Glance
```bash
# Count completed vs total
grep -c '^\- \[x\]' IMPLEMENTATION_PLAN.md # Done
grep -c '^\- \[ \]' IMPLEMENTATION_PLAN.md # Remaining
```
---
## Model Selection Strategy
Different tasks need different models:
| Task Type | Recommended Model | Why |
|-----------|------------------|-----|
| Planning (decomposition) | opus | Needs strong reasoning for task ordering |
| Scaffolding (config, structure) | sonnet | Simple, deterministic work |
| Feature implementation | sonnet | Good balance of speed and quality |
| Complex algorithms | opus | Needs deeper reasoning |
| Bug fixing | sonnet or opus | Depends on bug complexity |
| Documentation | sonnet | Writing, not complex reasoning |
### Cost Optimization
```
Planning: 1 iteration × opus = $$ (one-time cost)
Building: 15 iterations × sonnet = $ per iteration
Complex: 3 iterations × opus = $$ per iteration
Total: 19 iterations = $$$ (manageable)
```
vs.
```
Everything: 19 iterations × opus = $$$$$$ (expensive!)
```
Use the `model` parameter in sessions_spawn to control this per-iteration.
---
## OpenClaw-Specific Agent Instructions
When running in OpenClaw, add these to your AGENT.md:
```markdown
## OpenClaw Environment Notes
### File Access
- Working directory: /home/paulh/.openclaw/workspace/projects/[project]
- Use absolute paths in all exec commands
- The `~` shorthand does NOT work in exec — use full paths
### Build Commands
- Use `cd /absolute/path && command` pattern (cwd parameter is unreliable)
- Example: `cd /home/paulh/.openclaw/workspace/projects/my-project && npm test`
### Git
- Commit after each completed task
- Use descriptive commit messages: `feat: implement QFX parser with field extraction`
- Don't push — let the human decide when to push
### Communication
- Your output is delivered to the human's chat (Telegram/Discord)
- Keep summaries brief — one paragraph max
- Include: what task you completed, what test results look like, what's next
### Signals
- If all tasks done: include "ALL_TASKS_COMPLETE" in your response
- If stuck: include "STUCK:" followed by a description of the blocker
- If error: include "ERROR:" followed by the error details
```
---
## Troubleshooting OpenClaw Integration
### Sub-agent can't find files
- Verify the project path is absolute
- Check that the workspace is accessible (not in a different user's home)
### Sub-agent uses wrong model
- Specify `model` explicitly in sessions_spawn
- Check session_status to verify which model was used
- Known issue: isolated sessions may not respect model parameter (use main session sub-agents instead)
### Sub-agent times out
- Default timeout may be too short for complex tasks
- Set `timeoutSeconds: 600` (10 minutes) for implementation tasks
- Set `timeoutSeconds: 900` (15 minutes) for complex tasks
- Set `timeoutSeconds: 120` (2 minutes) for simple checks
### Cron job fires but nothing happens
- Check cron job status: `cron list`
- Review recent runs: `cron runs --jobId <id>`
- Verify the project path in the cron job message
### Results not delivered to chat
- Check `delivery.mode` is set to `"announce"` in cron job config
- For sessions_spawn from main session, results auto-deliver
---
_OpenClaw turns the agent harness from a terminal-bound tool into an always-available autonomous workforce._