# Incident Log — Recipe Manager Harness Purpose: track operational failures, impact, root cause, and permanent fixes. --- ## Template ## [YYYY-MM-DD HH:MM TZ] Incident Title - **Severity:** Low / Medium / High - **Status:** Open / Mitigated / Resolved - **Detected by:** Monitor / Human / Agent - **Impact:** - What stopped or degraded - Duration - **Symptoms:** - Exact error text - Observable behavior - **Root cause:** - Why it happened - **Immediate mitigation:** - What was done to restore service - **Permanent fix:** - Config/code/process changes - **Verification:** - How we confirmed it works - **Prevention follow-up:** - Guardrails/tests added - **Links:** - Commit(s): - Related files: - Session/cron IDs: --- ## Recorded Incidents ## [2026-03-24 08:00 EDT] Auto-iterator/monitor stalls due to model auth mismatch - **Severity:** High - **Status:** Resolved - **Detected by:** Human - **Impact:** - Iterations stopped for ~10 hours - No new recipe-manager commits during outage - **Symptoms:** - Cron failures: `No API key found for provider "openai"` - Repeated job errors with no productive iteration - **Root cause:** - Cron jobs used `openai/...` model path (API-key provider) while environment was authenticated via `openai-codex` OAuth - **Immediate mitigation:** - Disabled broken jobs - Manually spawned recovery iterations - **Permanent fix:** - Cron jobs updated to `openai-codex/gpt-5.3-codex` - **Verification:** - Iterations resumed and commits landed again - **Prevention follow-up:** - Runbook updated with provider-prefix rule - **Links:** - Related files: RUNBOOK.md ## [2026-03-24 21:40 EDT] Iteration skips due to stale session detection + wrong working dir - **Severity:** High - **Status:** Resolved - **Detected by:** Human + monitor alerts - **Impact:** - Auto-iterator repeatedly skipped or produced STUCK responses - **Symptoms:** - `SKIP: iteration already running` with no new commit - `STUCK: ... AGENT_INSTRUCTIONS.md and TODO.md missing from /workspace` - **Root cause:** - Stale completed sessions counted as active - Iteration prompts sometimes lacked explicit project-root guard - **Immediate mitigation:** - Spawned manual iteration with absolute path + pre-flight checks - **Permanent fix:** - Added mandatory pre-flight guard in AGENT_INSTRUCTIONS.md - Updated auto-iterator to require absolute path and freshness-based active-run detection - **Verification:** - New iterations completed successfully with commits: - `87e9181` (import test) - `276e03c` (import UI page/form) - `d4aed47` (parsed preview) - **Prevention follow-up:** - Monitor updated to track `recipe-v1-iter*` labels for v1 phase - **Links:** - Commit(s): `37b17f7`, `d4aed47`, `276e03c`, `87e9181` - Related files: AGENT_INSTRUCTIONS.md, TODO.md, RUNBOOK.md ## [2026-03-24 17:55 EDT] Docker validation blocked in runtime host - **Severity:** Medium - **Status:** Mitigated (manual follow-up required) - **Detected by:** Agent - **Impact:** - Could not complete local docker deployment test from agent environment - **Symptoms:** - `docker: command not found` - **Root cause:** - Runtime host lacks Docker CLI/daemon - **Immediate mitigation:** - Marked task as manual host validation - **Permanent fix:** - Keep as explicit manual step in TODO for host with Docker installed - **Verification:** - Manual non-docker dev run validated separately - **Prevention follow-up:** - Documented as environment capability mismatch in RUNBOOK.md - **Links:** - Commit: `1a4b984`