6.9 KiB
Recipe Manager Agentic Runbook
Last updated: 2026-03-24
Purpose
Operational guide for running the Recipe Manager agent harness reliably.
Core Execution Model
- One task per iteration
- One commit per iteration
- TODO.md is the authoritative queue
- Work only in:
/home/paulh/.openclaw/workspace/projects/recipe-manager
Required Guards (Must Pass Before Coding)
Pre-flight checks
Before any iteration starts, verify these files exist:
AGENT_INSTRUCTIONS.mdTODO.md
If missing, fail with:
STUCK: bad working dir or missing harness files at /home/paulh/.openclaw/workspace/projects/recipe-manager
Monitoring Signals (How we know it's working)
A run is healthy only when all 3 are true:
- Active session updated recently (
recipe-v1-iter*) - New git commits are landing
- TODO checkboxes advance
Known Failure Modes and Fixes
1) Wrong working directory
Symptom
Agent says AGENT_INSTRUCTIONS.md / TODO.md missing in /workspace.
Root cause
Spawner started outside project root.
Fix
- Force absolute project path in every task prompt
- Add mandatory pre-flight guard
- Relaunch fresh iteration
2) False “iteration already running”
Symptom
Auto-iterator repeatedly prints SKIP even when no coding progress occurs.
Root cause
It treated stale historical sessions as active.
Fix
- Treat a session as active only if updated recently (freshness window)
- Use current phase labels only (
recipe-v1-iter*)
3) Label mismatch across phases
Symptom
Monitor reports wrong status or misses active runs.
Root cause
MVP labels (recipe-mvp-*) used during v1 phase.
Fix
- Update monitor + iterator to phase-specific labels
- Standardize naming per phase:
- MVP:
recipe-mvp-iter* - v1:
recipe-v1-iter*
- MVP:
4) Model/provider auth mismatch
Symptom
Cron jobs fail with:
No API key found for provider openai- or Copilot cooldown rate-limit errors
Root cause
Using openai/... models without OpenAI API key.
Fix
- Use OAuth provider model prefix:
openai-codex/... - For this project, prefer:
openai-codex/gpt-5.3-codex
5) Environment capability mismatch (Docker)
Symptom
Task fails with docker: command not found.
Root cause
Agent runtime host lacks Docker.
Fix
- Mark as manual host validation task
- Continue with unblocked tasks
6) Runtime module mismatch (ESM/CommonJS)
Symptom
Backend runtime error: require is not defined.
Root cause
Using require() in ESM code path.
Fix
- Replace
require('fs')calls with ESM imports (writeFileSync) - Build + rerun server
Operational Controls
Pause automation
Disable both jobs:
- Recipe Manager Auto-Iterator
- Recipe Manager Progress Monitor
Resume automation
Enable both jobs, then manually kick one fresh iteration.
Manual override iteration (safe restart)
Spawn one explicit iteration with:
- absolute project path
- pre-flight guard
- one-task/one-commit rule
Workflow Periodic Execution (cron + systemd)
All commands assume project root:
/home/paulh/.openclaw/workspace/projects/recipe-manager
Manual commands
# Resume from checkpoint (default mode)
npm run workflow:run
# Force restart from stage 1
npm run workflow:run -- --mode restart
# Scheduled run entrypoint (resume + morning report)
npm run workflow:schedule
# Health signal for automation (0=healthy, 1=failed/blocked/unknown)
npm run workflow:health-check
Cron example
Run scheduler every 15 minutes, health check every 5 minutes:
*/15 * * * * cd /home/paulh/.openclaw/workspace/projects/recipe-manager && /usr/bin/npm run workflow:schedule >> /home/paulh/.openclaw/workspace/projects/recipe-manager/status/workflow-schedule.log 2>&1
*/5 * * * * cd /home/paulh/.openclaw/workspace/projects/recipe-manager && /usr/bin/npm run workflow:health-check >> /home/paulh/.openclaw/workspace/projects/recipe-manager/status/workflow-health.log 2>&1
systemd example
Create one-shot services and timers:
/etc/systemd/system/recipe-workflow-schedule.service
[Unit]
Description=Recipe Manager scheduled workflow run
After=network.target
[Service]
Type=oneshot
WorkingDirectory=/home/paulh/.openclaw/workspace/projects/recipe-manager
ExecStart=/usr/bin/npm run workflow:schedule
/etc/systemd/system/recipe-workflow-schedule.timer
[Unit]
Description=Run Recipe Manager scheduled workflow every 15 minutes
[Timer]
OnCalendar=*:0/15
Persistent=true
[Install]
WantedBy=timers.target
/etc/systemd/system/recipe-workflow-health.service
[Unit]
Description=Recipe Manager workflow health check
After=network.target
[Service]
Type=oneshot
WorkingDirectory=/home/paulh/.openclaw/workspace/projects/recipe-manager
ExecStart=/usr/bin/npm run workflow:health-check
/etc/systemd/system/recipe-workflow-health.timer
[Unit]
Description=Run Recipe Manager workflow health check every 5 minutes
[Timer]
OnCalendar=*:0/5
Persistent=true
[Install]
WantedBy=timers.target
Enable timers:
sudo systemctl daemon-reload
sudo systemctl enable --now recipe-workflow-schedule.timer recipe-workflow-health.timer
Troubleshooting failed/blocked status
When npm run workflow:health-check returns exit code 1 with {"status":"failed"} or {"status":"blocked"}:
- Check current workflow status payload:
cat status/workflow-status.json - Check recent progress log entries:
tail -n 50 status/workflow-progress.jsonl - Retry from checkpoint:
npm run workflow:run - If still blocked/failed, force a clean restart:
npm run workflow:run -- --mode restart - Re-run health check and confirm healthy output (
idle,running, orcompleted):npm run workflow:health-check
If status file is missing or malformed, the health check prints status_read_failed and exits 1; regenerate state with npm run workflow:run -- --mode restart.
Completion Definition
A phase is complete when:
- No unchecked tasks remain in that phase section of TODO.md
- Latest iteration exits without STUCK/ERROR
- Commit + TODO update are present
Recommended Cadence
- Auto-iterator: every 15 minutes
- Progress monitor: every 5 minutes (high visibility mode)
If noisy, set monitor to every 10–15 minutes.
Handoff Checklist (Before ending a session)
- Confirm latest commit hash
- Confirm active phase + next unchecked task
- Confirm auto-iterator enabled/disabled status
- Confirm monitor enabled/disabled status
- Confirm no stale active-session false positives
Quick Status Commands
Latest commit
git log -1 --oneline
Next tasks
grep -n "^- \[ \]" TODO.md | head
Recent progress
git log --oneline -5
This runbook should be updated whenever a new failure mode appears.
See also: INCIDENT_LOG.md for timestamped operational incidents and fixes.