# Recipe Manager Agentic Runbook Last updated: 2026-03-24 ## Purpose Operational guide for running the Recipe Manager agent harness reliably. --- ## Core Execution Model - One task per iteration - One commit per iteration - TODO.md is the authoritative queue - Work only in: `/home/paulh/.openclaw/workspace/projects/recipe-manager` --- ## Required Guards (Must Pass Before Coding) ### Pre-flight checks Before any iteration starts, verify these files exist: - `AGENT_INSTRUCTIONS.md` - `TODO.md` If missing, fail with: `STUCK: bad working dir or missing harness files at /home/paulh/.openclaw/workspace/projects/recipe-manager` --- ## Monitoring Signals (How we know it's working) A run is healthy only when all 3 are true: 1. Active session updated recently (`recipe-v1-iter*`) 2. New git commits are landing 3. TODO checkboxes advance --- ## Known Failure Modes and Fixes ## 1) Wrong working directory ### Symptom Agent says AGENT_INSTRUCTIONS.md / TODO.md missing in `/workspace`. ### Root cause Spawner started outside project root. ### Fix - Force absolute project path in every task prompt - Add mandatory pre-flight guard - Relaunch fresh iteration --- ## 2) False “iteration already running” ### Symptom Auto-iterator repeatedly prints SKIP even when no coding progress occurs. ### Root cause It treated stale historical sessions as active. ### Fix - Treat a session as active only if updated recently (freshness window) - Use current phase labels only (`recipe-v1-iter*`) --- ## 3) Label mismatch across phases ### Symptom Monitor reports wrong status or misses active runs. ### Root cause MVP labels (`recipe-mvp-*`) used during v1 phase. ### Fix - Update monitor + iterator to phase-specific labels - Standardize naming per phase: - MVP: `recipe-mvp-iter*` - v1: `recipe-v1-iter*` --- ## 4) Model/provider auth mismatch ### Symptom Cron jobs fail with: - `No API key found for provider openai` - or Copilot cooldown rate-limit errors ### Root cause Using `openai/...` models without OpenAI API key. ### Fix - Use OAuth provider model prefix: `openai-codex/...` - For this project, prefer: `openai-codex/gpt-5.3-codex` --- ## 5) Environment capability mismatch (Docker) ### Symptom Task fails with `docker: command not found`. ### Root cause Agent runtime host lacks Docker. ### Fix - Mark as manual host validation task - Continue with unblocked tasks --- ## 6) Runtime module mismatch (ESM/CommonJS) ### Symptom Backend runtime error: `require is not defined`. ### Root cause Using `require()` in ESM code path. ### Fix - Replace `require('fs')` calls with ESM imports (`writeFileSync`) - Build + rerun server --- ## Operational Controls ## Pause automation Disable both jobs: - Recipe Manager Auto-Iterator - Recipe Manager Progress Monitor ## Resume automation Enable both jobs, then manually kick one fresh iteration. ## Manual override iteration (safe restart) Spawn one explicit iteration with: - absolute project path - pre-flight guard - one-task/one-commit rule --- ## Workflow Periodic Execution (cron + systemd) All commands assume project root: `/home/paulh/.openclaw/workspace/projects/recipe-manager` ### Manual commands ```bash # Resume from checkpoint (default mode) npm run workflow:run # Force restart from stage 1 npm run workflow:run -- --mode restart # Scheduled run entrypoint (resume + morning report) npm run workflow:schedule # Health signal for automation (0=healthy, 1=failed/blocked/unknown) npm run workflow:health-check ``` ### Cron example Run scheduler every 15 minutes, health check every 5 minutes: ```cron */15 * * * * cd /home/paulh/.openclaw/workspace/projects/recipe-manager && /usr/bin/npm run workflow:schedule >> /home/paulh/.openclaw/workspace/projects/recipe-manager/status/workflow-schedule.log 2>&1 */5 * * * * cd /home/paulh/.openclaw/workspace/projects/recipe-manager && /usr/bin/npm run workflow:health-check >> /home/paulh/.openclaw/workspace/projects/recipe-manager/status/workflow-health.log 2>&1 ``` ### systemd example Create one-shot services and timers: `/etc/systemd/system/recipe-workflow-schedule.service` ```ini [Unit] Description=Recipe Manager scheduled workflow run After=network.target [Service] Type=oneshot WorkingDirectory=/home/paulh/.openclaw/workspace/projects/recipe-manager ExecStart=/usr/bin/npm run workflow:schedule ``` `/etc/systemd/system/recipe-workflow-schedule.timer` ```ini [Unit] Description=Run Recipe Manager scheduled workflow every 15 minutes [Timer] OnCalendar=*:0/15 Persistent=true [Install] WantedBy=timers.target ``` `/etc/systemd/system/recipe-workflow-health.service` ```ini [Unit] Description=Recipe Manager workflow health check After=network.target [Service] Type=oneshot WorkingDirectory=/home/paulh/.openclaw/workspace/projects/recipe-manager ExecStart=/usr/bin/npm run workflow:health-check ``` `/etc/systemd/system/recipe-workflow-health.timer` ```ini [Unit] Description=Run Recipe Manager workflow health check every 5 minutes [Timer] OnCalendar=*:0/5 Persistent=true [Install] WantedBy=timers.target ``` Enable timers: ```bash sudo systemctl daemon-reload sudo systemctl enable --now recipe-workflow-schedule.timer recipe-workflow-health.timer ``` ### Troubleshooting failed/blocked status When `npm run workflow:health-check` returns exit code `1` with `{"status":"failed"}` or `{"status":"blocked"}`: 1. Check current workflow status payload: ```bash cat status/workflow-status.json ``` 2. Check recent progress log entries: ```bash tail -n 50 status/workflow-progress.jsonl ``` 3. Retry from checkpoint: ```bash npm run workflow:run ``` 4. If still blocked/failed, force a clean restart: ```bash npm run workflow:run -- --mode restart ``` 5. Re-run health check and confirm healthy output (`idle`, `running`, or `completed`): ```bash npm run workflow:health-check ``` If status file is missing or malformed, the health check prints `status_read_failed` and exits `1`; regenerate state with `npm run workflow:run -- --mode restart`. --- ## Completion Definition A phase is complete when: 1. No unchecked tasks remain in that phase section of TODO.md 2. Latest iteration exits without STUCK/ERROR 3. Commit + TODO update are present --- ## Recommended Cadence - Auto-iterator: every 15 minutes - Progress monitor: every 5 minutes (high visibility mode) If noisy, set monitor to every 10–15 minutes. --- ## Handoff Checklist (Before ending a session) - [ ] Confirm latest commit hash - [ ] Confirm active phase + next unchecked task - [ ] Confirm auto-iterator enabled/disabled status - [ ] Confirm monitor enabled/disabled status - [ ] Confirm no stale active-session false positives --- ## Quick Status Commands ### Latest commit `git log -1 --oneline` ### Next tasks `grep -n "^- \[ \]" TODO.md | head` ### Recent progress `git log --oneline -5` --- This runbook should be updated whenever a new failure mode appears. See also: `INCIDENT_LOG.md` for timestamped operational incidents and fixes.