6.9 KiB

Raw Permalink Blame History

Recipe Manager Agentic Runbook

Last updated: 2026-03-24

Purpose

Operational guide for running the Recipe Manager agent harness reliably.

Core Execution Model

One task per iteration
One commit per iteration
TODO.md is the authoritative queue
Work only in: /home/paulh/.openclaw/workspace/projects/recipe-manager

Required Guards (Must Pass Before Coding)

Pre-flight checks

Before any iteration starts, verify these files exist:

AGENT_INSTRUCTIONS.md
TODO.md

If missing, fail with: STUCK: bad working dir or missing harness files at /home/paulh/.openclaw/workspace/projects/recipe-manager

Monitoring Signals (How we know it's working)

A run is healthy only when all 3 are true:

Active session updated recently (recipe-v1-iter*)
New git commits are landing
TODO checkboxes advance

Known Failure Modes and Fixes

1) Wrong working directory

Symptom

Agent says AGENT_INSTRUCTIONS.md / TODO.md missing in /workspace.

Root cause

Spawner started outside project root.

Fix

Force absolute project path in every task prompt
Add mandatory pre-flight guard
Relaunch fresh iteration

2) False “iteration already running”

Symptom

Auto-iterator repeatedly prints SKIP even when no coding progress occurs.

Root cause

It treated stale historical sessions as active.

Fix

Treat a session as active only if updated recently (freshness window)
Use current phase labels only (recipe-v1-iter*)

3) Label mismatch across phases

Symptom

Monitor reports wrong status or misses active runs.

Root cause

MVP labels (recipe-mvp-*) used during v1 phase.

Fix

Update monitor + iterator to phase-specific labels
Standardize naming per phase:
- MVP: recipe-mvp-iter*
- v1: recipe-v1-iter*

4) Model/provider auth mismatch

Symptom

Cron jobs fail with:

No API key found for provider openai
or Copilot cooldown rate-limit errors

Root cause

Using openai/... models without OpenAI API key.

Fix

Use OAuth provider model prefix: openai-codex/...
For this project, prefer: openai-codex/gpt-5.3-codex

5) Environment capability mismatch (Docker)

Symptom

Task fails with docker: command not found.

Root cause

Agent runtime host lacks Docker.

Fix

Mark as manual host validation task
Continue with unblocked tasks

6) Runtime module mismatch (ESM/CommonJS)

Symptom

Backend runtime error: require is not defined.

Root cause

Using require() in ESM code path.

Fix

Replace require('fs') calls with ESM imports (writeFileSync)
Build + rerun server

Operational Controls

Pause automation

Disable both jobs:

Recipe Manager Auto-Iterator
Recipe Manager Progress Monitor

Resume automation

Enable both jobs, then manually kick one fresh iteration.

Manual override iteration (safe restart)

Spawn one explicit iteration with:

absolute project path
pre-flight guard
one-task/one-commit rule

Workflow Periodic Execution (cron + systemd)

All commands assume project root: /home/paulh/.openclaw/workspace/projects/recipe-manager

Manual commands

# Resume from checkpoint (default mode)
npm run workflow:run

# Force restart from stage 1
npm run workflow:run -- --mode restart

# Scheduled run entrypoint (resume + morning report)
npm run workflow:schedule

# Health signal for automation (0=healthy, 1=failed/blocked/unknown)
npm run workflow:health-check

Cron example

Run scheduler every 15 minutes, health check every 5 minutes:

*/15 * * * * cd /home/paulh/.openclaw/workspace/projects/recipe-manager && /usr/bin/npm run workflow:schedule >> /home/paulh/.openclaw/workspace/projects/recipe-manager/status/workflow-schedule.log 2>&1
*/5 * * * * cd /home/paulh/.openclaw/workspace/projects/recipe-manager && /usr/bin/npm run workflow:health-check >> /home/paulh/.openclaw/workspace/projects/recipe-manager/status/workflow-health.log 2>&1

systemd example

Create one-shot services and timers:

/etc/systemd/system/recipe-workflow-schedule.service

[Unit]
Description=Recipe Manager scheduled workflow run
After=network.target

[Service]
Type=oneshot
WorkingDirectory=/home/paulh/.openclaw/workspace/projects/recipe-manager
ExecStart=/usr/bin/npm run workflow:schedule

/etc/systemd/system/recipe-workflow-schedule.timer

[Unit]
Description=Run Recipe Manager scheduled workflow every 15 minutes

[Timer]
OnCalendar=*:0/15
Persistent=true

[Install]
WantedBy=timers.target

/etc/systemd/system/recipe-workflow-health.service

[Unit]
Description=Recipe Manager workflow health check
After=network.target

[Service]
Type=oneshot
WorkingDirectory=/home/paulh/.openclaw/workspace/projects/recipe-manager
ExecStart=/usr/bin/npm run workflow:health-check

/etc/systemd/system/recipe-workflow-health.timer

[Unit]
Description=Run Recipe Manager workflow health check every 5 minutes

[Timer]
OnCalendar=*:0/5
Persistent=true

[Install]
WantedBy=timers.target

Enable timers:

sudo systemctl daemon-reload
sudo systemctl enable --now recipe-workflow-schedule.timer recipe-workflow-health.timer

Troubleshooting failed/blocked status

When npm run workflow:health-check returns exit code 1 with {"status":"failed"} or {"status":"blocked"}:

Check current workflow status payload:
```
cat status/workflow-status.json
```

Check recent progress log entries:

tail -n 50 status/workflow-progress.jsonl

Retry from checkpoint:
```
npm run workflow:run
```
If still blocked/failed, force a clean restart:
```
npm run workflow:run -- --mode restart
```
Re-run health check and confirm healthy output (idle, running, or completed):
```
npm run workflow:health-check
```

If status file is missing or malformed, the health check prints status_read_failed and exits 1; regenerate state with npm run workflow:run -- --mode restart.

Completion Definition

A phase is complete when:

No unchecked tasks remain in that phase section of TODO.md
Latest iteration exits without STUCK/ERROR
Commit + TODO update are present

Recommended Cadence

Auto-iterator: every 15 minutes
Progress monitor: every 5 minutes (high visibility mode)

If noisy, set monitor to every 10–15 minutes.

Handoff Checklist (Before ending a session)

Confirm latest commit hash
Confirm active phase + next unchecked task
Confirm auto-iterator enabled/disabled status
Confirm monitor enabled/disabled status
Confirm no stale active-session false positives

Quick Status Commands

Latest commit

git log -1 --oneline

Next tasks

grep -n "^- \[ \]" TODO.md | head

Recent progress

git log --oneline -5

This runbook should be updated whenever a new failure mode appears.

See also: INCIDENT_LOG.md for timestamped operational incidents and fixes.

6.9 KiB Raw Permalink Blame History Unescape Escape

Recipe Manager Agentic Runbook

Purpose

Core Execution Model

Required Guards (Must Pass Before Coding)

Pre-flight checks

Monitoring Signals (How we know it's working)

Known Failure Modes and Fixes

1) Wrong working directory

Symptom

Root cause

Fix

2) False “iteration already running”

Symptom

Root cause

Fix

3) Label mismatch across phases

Symptom

Root cause

Fix

4) Model/provider auth mismatch

Symptom

Root cause

Fix

5) Environment capability mismatch (Docker)

Symptom

Root cause

Fix

6) Runtime module mismatch (ESM/CommonJS)

Symptom

Root cause

Fix

Operational Controls

Pause automation

Resume automation

Manual override iteration (safe restart)

Workflow Periodic Execution (cron + systemd)

Manual commands

Cron example

systemd example

Troubleshooting failed/blocked status

Completion Definition

Recommended Cadence

Handoff Checklist (Before ending a session)

Quick Status Commands

Latest commit

Next tasks

Recent progress

6.9 KiB

Raw Permalink Blame History