# Incident Log — Recipe Manager Harness

Purpose: track operational failures, impact, root cause, and permanent fixes.

---

## Template

## [YYYY-MM-DD HH:MM TZ] Incident Title
- **Severity:** Low / Medium / High
- **Status:** Open / Mitigated / Resolved
- **Detected by:** Monitor / Human / Agent
- **Impact:**
  - What stopped or degraded
  - Duration
- **Symptoms:**
  - Exact error text
  - Observable behavior
- **Root cause:**
  - Why it happened
- **Immediate mitigation:**
  - What was done to restore service
- **Permanent fix:**
  - Config/code/process changes
- **Verification:**
  - How we confirmed it works
- **Prevention follow-up:**
  - Guardrails/tests added
- **Links:**
  - Commit(s):
  - Related files:
  - Session/cron IDs:

---

## Recorded Incidents

## [2026-03-24 08:00 EDT] Auto-iterator/monitor stalls due to model auth mismatch
- **Severity:** High
- **Status:** Resolved
- **Detected by:** Human
- **Impact:**
  - Iterations stopped for ~10 hours
  - No new recipe-manager commits during outage
- **Symptoms:**
  - Cron failures: `No API key found for provider "openai"`
  - Repeated job errors with no productive iteration
- **Root cause:**
  - Cron jobs used `openai/...` model path (API-key provider) while environment was authenticated via `openai-codex` OAuth
- **Immediate mitigation:**
  - Disabled broken jobs
  - Manually spawned recovery iterations
- **Permanent fix:**
  - Cron jobs updated to `openai-codex/gpt-5.3-codex`
- **Verification:**
  - Iterations resumed and commits landed again
- **Prevention follow-up:**
  - Runbook updated with provider-prefix rule
- **Links:**
  - Related files: RUNBOOK.md

## [2026-03-24 21:40 EDT] Iteration skips due to stale session detection + wrong working dir
- **Severity:** High
- **Status:** Resolved
- **Detected by:** Human + monitor alerts
- **Impact:**
  - Auto-iterator repeatedly skipped or produced STUCK responses
- **Symptoms:**
  - `SKIP: iteration already running` with no new commit
  - `STUCK: ... AGENT_INSTRUCTIONS.md and TODO.md missing from /workspace`
- **Root cause:**
  - Stale completed sessions counted as active
  - Iteration prompts sometimes lacked explicit project-root guard
- **Immediate mitigation:**
  - Spawned manual iteration with absolute path + pre-flight checks
- **Permanent fix:**
  - Added mandatory pre-flight guard in AGENT_INSTRUCTIONS.md
  - Updated auto-iterator to require absolute path and freshness-based active-run detection
- **Verification:**
  - New iterations completed successfully with commits:
    - `87e9181` (import test)
    - `276e03c` (import UI page/form)
    - `d4aed47` (parsed preview)
- **Prevention follow-up:**
  - Monitor updated to track `recipe-v1-iter*` labels for v1 phase
- **Links:**
  - Commit(s): `37b17f7`, `d4aed47`, `276e03c`, `87e9181`
  - Related files: AGENT_INSTRUCTIONS.md, TODO.md, RUNBOOK.md

## [2026-03-24 17:55 EDT] Docker validation blocked in runtime host
- **Severity:** Medium
- **Status:** Mitigated (manual follow-up required)
- **Detected by:** Agent
- **Impact:**
  - Could not complete local docker deployment test from agent environment
- **Symptoms:**
  - `docker: command not found`
- **Root cause:**
  - Runtime host lacks Docker CLI/daemon
- **Immediate mitigation:**
  - Marked task as manual host validation
- **Permanent fix:**
  - Keep as explicit manual step in TODO for host with Docker installed
- **Verification:**
  - Manual non-docker dev run validated separately
- **Prevention follow-up:**
  - Documented as environment capability mismatch in RUNBOOK.md
- **Links:**
  - Commit: `1a4b984`