309 lines
6.9 KiB
Markdown
309 lines
6.9 KiB
Markdown
# Recipe Manager Agentic Runbook
|
||
|
||
Last updated: 2026-03-24
|
||
|
||
## Purpose
|
||
Operational guide for running the Recipe Manager agent harness reliably.
|
||
|
||
---
|
||
|
||
## Core Execution Model
|
||
|
||
- One task per iteration
|
||
- One commit per iteration
|
||
- TODO.md is the authoritative queue
|
||
- Work only in:
|
||
`/home/paulh/.openclaw/workspace/projects/recipe-manager`
|
||
|
||
---
|
||
|
||
## Required Guards (Must Pass Before Coding)
|
||
|
||
### Pre-flight checks
|
||
Before any iteration starts, verify these files exist:
|
||
- `AGENT_INSTRUCTIONS.md`
|
||
- `TODO.md`
|
||
|
||
If missing, fail with:
|
||
`STUCK: bad working dir or missing harness files at /home/paulh/.openclaw/workspace/projects/recipe-manager`
|
||
|
||
---
|
||
|
||
## Monitoring Signals (How we know it's working)
|
||
|
||
A run is healthy only when all 3 are true:
|
||
1. Active session updated recently (`recipe-v1-iter*`)
|
||
2. New git commits are landing
|
||
3. TODO checkboxes advance
|
||
|
||
---
|
||
|
||
## Known Failure Modes and Fixes
|
||
|
||
## 1) Wrong working directory
|
||
### Symptom
|
||
Agent says AGENT_INSTRUCTIONS.md / TODO.md missing in `/workspace`.
|
||
|
||
### Root cause
|
||
Spawner started outside project root.
|
||
|
||
### Fix
|
||
- Force absolute project path in every task prompt
|
||
- Add mandatory pre-flight guard
|
||
- Relaunch fresh iteration
|
||
|
||
---
|
||
|
||
## 2) False “iteration already running”
|
||
### Symptom
|
||
Auto-iterator repeatedly prints SKIP even when no coding progress occurs.
|
||
|
||
### Root cause
|
||
It treated stale historical sessions as active.
|
||
|
||
### Fix
|
||
- Treat a session as active only if updated recently (freshness window)
|
||
- Use current phase labels only (`recipe-v1-iter*`)
|
||
|
||
---
|
||
|
||
## 3) Label mismatch across phases
|
||
### Symptom
|
||
Monitor reports wrong status or misses active runs.
|
||
|
||
### Root cause
|
||
MVP labels (`recipe-mvp-*`) used during v1 phase.
|
||
|
||
### Fix
|
||
- Update monitor + iterator to phase-specific labels
|
||
- Standardize naming per phase:
|
||
- MVP: `recipe-mvp-iter*`
|
||
- v1: `recipe-v1-iter*`
|
||
|
||
---
|
||
|
||
## 4) Model/provider auth mismatch
|
||
### Symptom
|
||
Cron jobs fail with:
|
||
- `No API key found for provider openai`
|
||
- or Copilot cooldown rate-limit errors
|
||
|
||
### Root cause
|
||
Using `openai/...` models without OpenAI API key.
|
||
|
||
### Fix
|
||
- Use OAuth provider model prefix: `openai-codex/...`
|
||
- For this project, prefer:
|
||
`openai-codex/gpt-5.3-codex`
|
||
|
||
---
|
||
|
||
## 5) Environment capability mismatch (Docker)
|
||
### Symptom
|
||
Task fails with `docker: command not found`.
|
||
|
||
### Root cause
|
||
Agent runtime host lacks Docker.
|
||
|
||
### Fix
|
||
- Mark as manual host validation task
|
||
- Continue with unblocked tasks
|
||
|
||
---
|
||
|
||
## 6) Runtime module mismatch (ESM/CommonJS)
|
||
### Symptom
|
||
Backend runtime error: `require is not defined`.
|
||
|
||
### Root cause
|
||
Using `require()` in ESM code path.
|
||
|
||
### Fix
|
||
- Replace `require('fs')` calls with ESM imports (`writeFileSync`)
|
||
- Build + rerun server
|
||
|
||
---
|
||
|
||
## Operational Controls
|
||
|
||
## Pause automation
|
||
Disable both jobs:
|
||
- Recipe Manager Auto-Iterator
|
||
- Recipe Manager Progress Monitor
|
||
|
||
## Resume automation
|
||
Enable both jobs, then manually kick one fresh iteration.
|
||
|
||
## Manual override iteration (safe restart)
|
||
Spawn one explicit iteration with:
|
||
- absolute project path
|
||
- pre-flight guard
|
||
- one-task/one-commit rule
|
||
|
||
---
|
||
|
||
## Workflow Periodic Execution (cron + systemd)
|
||
|
||
All commands assume project root:
|
||
`/home/paulh/.openclaw/workspace/projects/recipe-manager`
|
||
|
||
### Manual commands
|
||
|
||
```bash
|
||
# Resume from checkpoint (default mode)
|
||
npm run workflow:run
|
||
|
||
# Force restart from stage 1
|
||
npm run workflow:run -- --mode restart
|
||
|
||
# Scheduled run entrypoint (resume + morning report)
|
||
npm run workflow:schedule
|
||
|
||
# Health signal for automation (0=healthy, 1=failed/blocked/unknown)
|
||
npm run workflow:health-check
|
||
```
|
||
|
||
### Cron example
|
||
|
||
Run scheduler every 15 minutes, health check every 5 minutes:
|
||
|
||
```cron
|
||
*/15 * * * * cd /home/paulh/.openclaw/workspace/projects/recipe-manager && /usr/bin/npm run workflow:schedule >> /home/paulh/.openclaw/workspace/projects/recipe-manager/status/workflow-schedule.log 2>&1
|
||
*/5 * * * * cd /home/paulh/.openclaw/workspace/projects/recipe-manager && /usr/bin/npm run workflow:health-check >> /home/paulh/.openclaw/workspace/projects/recipe-manager/status/workflow-health.log 2>&1
|
||
```
|
||
|
||
### systemd example
|
||
|
||
Create one-shot services and timers:
|
||
|
||
`/etc/systemd/system/recipe-workflow-schedule.service`
|
||
```ini
|
||
[Unit]
|
||
Description=Recipe Manager scheduled workflow run
|
||
After=network.target
|
||
|
||
[Service]
|
||
Type=oneshot
|
||
WorkingDirectory=/home/paulh/.openclaw/workspace/projects/recipe-manager
|
||
ExecStart=/usr/bin/npm run workflow:schedule
|
||
```
|
||
|
||
`/etc/systemd/system/recipe-workflow-schedule.timer`
|
||
```ini
|
||
[Unit]
|
||
Description=Run Recipe Manager scheduled workflow every 15 minutes
|
||
|
||
[Timer]
|
||
OnCalendar=*:0/15
|
||
Persistent=true
|
||
|
||
[Install]
|
||
WantedBy=timers.target
|
||
```
|
||
|
||
`/etc/systemd/system/recipe-workflow-health.service`
|
||
```ini
|
||
[Unit]
|
||
Description=Recipe Manager workflow health check
|
||
After=network.target
|
||
|
||
[Service]
|
||
Type=oneshot
|
||
WorkingDirectory=/home/paulh/.openclaw/workspace/projects/recipe-manager
|
||
ExecStart=/usr/bin/npm run workflow:health-check
|
||
```
|
||
|
||
`/etc/systemd/system/recipe-workflow-health.timer`
|
||
```ini
|
||
[Unit]
|
||
Description=Run Recipe Manager workflow health check every 5 minutes
|
||
|
||
[Timer]
|
||
OnCalendar=*:0/5
|
||
Persistent=true
|
||
|
||
[Install]
|
||
WantedBy=timers.target
|
||
```
|
||
|
||
Enable timers:
|
||
|
||
```bash
|
||
sudo systemctl daemon-reload
|
||
sudo systemctl enable --now recipe-workflow-schedule.timer recipe-workflow-health.timer
|
||
```
|
||
|
||
### Troubleshooting failed/blocked status
|
||
|
||
When `npm run workflow:health-check` returns exit code `1` with `{"status":"failed"}` or `{"status":"blocked"}`:
|
||
|
||
1. Check current workflow status payload:
|
||
```bash
|
||
cat status/workflow-status.json
|
||
```
|
||
2. Check recent progress log entries:
|
||
```bash
|
||
tail -n 50 status/workflow-progress.jsonl
|
||
```
|
||
3. Retry from checkpoint:
|
||
```bash
|
||
npm run workflow:run
|
||
```
|
||
4. If still blocked/failed, force a clean restart:
|
||
```bash
|
||
npm run workflow:run -- --mode restart
|
||
```
|
||
5. Re-run health check and confirm healthy output (`idle`, `running`, or `completed`):
|
||
```bash
|
||
npm run workflow:health-check
|
||
```
|
||
|
||
If status file is missing or malformed, the health check prints `status_read_failed` and exits `1`; regenerate state with `npm run workflow:run -- --mode restart`.
|
||
|
||
---
|
||
|
||
## Completion Definition
|
||
|
||
A phase is complete when:
|
||
1. No unchecked tasks remain in that phase section of TODO.md
|
||
2. Latest iteration exits without STUCK/ERROR
|
||
3. Commit + TODO update are present
|
||
|
||
---
|
||
|
||
## Recommended Cadence
|
||
|
||
- Auto-iterator: every 15 minutes
|
||
- Progress monitor: every 5 minutes (high visibility mode)
|
||
|
||
If noisy, set monitor to every 10–15 minutes.
|
||
|
||
---
|
||
|
||
## Handoff Checklist (Before ending a session)
|
||
|
||
- [ ] Confirm latest commit hash
|
||
- [ ] Confirm active phase + next unchecked task
|
||
- [ ] Confirm auto-iterator enabled/disabled status
|
||
- [ ] Confirm monitor enabled/disabled status
|
||
- [ ] Confirm no stale active-session false positives
|
||
|
||
---
|
||
|
||
## Quick Status Commands
|
||
|
||
### Latest commit
|
||
`git log -1 --oneline`
|
||
|
||
### Next tasks
|
||
`grep -n "^- \[ \]" TODO.md | head`
|
||
|
||
### Recent progress
|
||
`git log --oneline -5`
|
||
|
||
---
|
||
|
||
This runbook should be updated whenever a new failure mode appears.
|
||
|
||
See also: `INCIDENT_LOG.md` for timestamped operational incidents and fixes.
|