recipe-manager/RUNBOOK.md

3.9 KiB
Raw Blame History

Recipe Manager Agentic Runbook

Last updated: 2026-03-24

Purpose

Operational guide for running the Recipe Manager agent harness reliably.


Core Execution Model

  • One task per iteration
  • One commit per iteration
  • TODO.md is the authoritative queue
  • Work only in: /home/paulh/.openclaw/workspace/projects/recipe-manager

Required Guards (Must Pass Before Coding)

Pre-flight checks

Before any iteration starts, verify these files exist:

  • AGENT_INSTRUCTIONS.md
  • TODO.md

If missing, fail with: STUCK: bad working dir or missing harness files at /home/paulh/.openclaw/workspace/projects/recipe-manager


Monitoring Signals (How we know it's working)

A run is healthy only when all 3 are true:

  1. Active session updated recently (recipe-v1-iter*)
  2. New git commits are landing
  3. TODO checkboxes advance

Known Failure Modes and Fixes

1) Wrong working directory

Symptom

Agent says AGENT_INSTRUCTIONS.md / TODO.md missing in /workspace.

Root cause

Spawner started outside project root.

Fix

  • Force absolute project path in every task prompt
  • Add mandatory pre-flight guard
  • Relaunch fresh iteration

2) False “iteration already running”

Symptom

Auto-iterator repeatedly prints SKIP even when no coding progress occurs.

Root cause

It treated stale historical sessions as active.

Fix

  • Treat a session as active only if updated recently (freshness window)
  • Use current phase labels only (recipe-v1-iter*)

3) Label mismatch across phases

Symptom

Monitor reports wrong status or misses active runs.

Root cause

MVP labels (recipe-mvp-*) used during v1 phase.

Fix

  • Update monitor + iterator to phase-specific labels
  • Standardize naming per phase:
    • MVP: recipe-mvp-iter*
    • v1: recipe-v1-iter*

4) Model/provider auth mismatch

Symptom

Cron jobs fail with:

  • No API key found for provider openai
  • or Copilot cooldown rate-limit errors

Root cause

Using openai/... models without OpenAI API key.

Fix

  • Use OAuth provider model prefix: openai-codex/...
  • For this project, prefer: openai-codex/gpt-5.3-codex

5) Environment capability mismatch (Docker)

Symptom

Task fails with docker: command not found.

Root cause

Agent runtime host lacks Docker.

Fix

  • Mark as manual host validation task
  • Continue with unblocked tasks

6) Runtime module mismatch (ESM/CommonJS)

Symptom

Backend runtime error: require is not defined.

Root cause

Using require() in ESM code path.

Fix

  • Replace require('fs') calls with ESM imports (writeFileSync)
  • Build + rerun server

Operational Controls

Pause automation

Disable both jobs:

  • Recipe Manager Auto-Iterator
  • Recipe Manager Progress Monitor

Resume automation

Enable both jobs, then manually kick one fresh iteration.

Manual override iteration (safe restart)

Spawn one explicit iteration with:

  • absolute project path
  • pre-flight guard
  • one-task/one-commit rule

Completion Definition

A phase is complete when:

  1. No unchecked tasks remain in that phase section of TODO.md
  2. Latest iteration exits without STUCK/ERROR
  3. Commit + TODO update are present

  • Auto-iterator: every 15 minutes
  • Progress monitor: every 5 minutes (high visibility mode)

If noisy, set monitor to every 1015 minutes.


Handoff Checklist (Before ending a session)

  • Confirm latest commit hash
  • Confirm active phase + next unchecked task
  • Confirm auto-iterator enabled/disabled status
  • Confirm monitor enabled/disabled status
  • Confirm no stale active-session false positives

Quick Status Commands

Latest commit

git log -1 --oneline

Next tasks

grep -n "^- \[ \]" TODO.md | head

Recent progress

git log --oneline -5


This runbook should be updated whenever a new failure mode appears.

See also: INCIDENT_LOG.md for timestamped operational incidents and fixes.