3.5 KiB
3.5 KiB
Incident Log — Recipe Manager Harness
Purpose: track operational failures, impact, root cause, and permanent fixes.
Template
[YYYY-MM-DD HH:MM TZ] Incident Title
- Severity: Low / Medium / High
- Status: Open / Mitigated / Resolved
- Detected by: Monitor / Human / Agent
- Impact:
- What stopped or degraded
- Duration
- Symptoms:
- Exact error text
- Observable behavior
- Root cause:
- Why it happened
- Immediate mitigation:
- What was done to restore service
- Permanent fix:
- Config/code/process changes
- Verification:
- How we confirmed it works
- Prevention follow-up:
- Guardrails/tests added
- Links:
- Commit(s):
- Related files:
- Session/cron IDs:
Recorded Incidents
[2026-03-24 08:00 EDT] Auto-iterator/monitor stalls due to model auth mismatch
- Severity: High
- Status: Resolved
- Detected by: Human
- Impact:
- Iterations stopped for ~10 hours
- No new recipe-manager commits during outage
- Symptoms:
- Cron failures:
No API key found for provider "openai" - Repeated job errors with no productive iteration
- Cron failures:
- Root cause:
- Cron jobs used
openai/...model path (API-key provider) while environment was authenticated viaopenai-codexOAuth
- Cron jobs used
- Immediate mitigation:
- Disabled broken jobs
- Manually spawned recovery iterations
- Permanent fix:
- Cron jobs updated to
openai-codex/gpt-5.3-codex
- Cron jobs updated to
- Verification:
- Iterations resumed and commits landed again
- Prevention follow-up:
- Runbook updated with provider-prefix rule
- Links:
- Related files: RUNBOOK.md
[2026-03-24 21:40 EDT] Iteration skips due to stale session detection + wrong working dir
- Severity: High
- Status: Resolved
- Detected by: Human + monitor alerts
- Impact:
- Auto-iterator repeatedly skipped or produced STUCK responses
- Symptoms:
SKIP: iteration already runningwith no new commitSTUCK: ... AGENT_INSTRUCTIONS.md and TODO.md missing from /workspace
- Root cause:
- Stale completed sessions counted as active
- Iteration prompts sometimes lacked explicit project-root guard
- Immediate mitigation:
- Spawned manual iteration with absolute path + pre-flight checks
- Permanent fix:
- Added mandatory pre-flight guard in AGENT_INSTRUCTIONS.md
- Updated auto-iterator to require absolute path and freshness-based active-run detection
- Verification:
- New iterations completed successfully with commits:
87e9181(import test)276e03c(import UI page/form)d4aed47(parsed preview)
- New iterations completed successfully with commits:
- Prevention follow-up:
- Monitor updated to track
recipe-v1-iter*labels for v1 phase
- Monitor updated to track
- Links:
- Commit(s):
37b17f7,d4aed47,276e03c,87e9181 - Related files: AGENT_INSTRUCTIONS.md, TODO.md, RUNBOOK.md
- Commit(s):
[2026-03-24 17:55 EDT] Docker validation blocked in runtime host
- Severity: Medium
- Status: Mitigated (manual follow-up required)
- Detected by: Agent
- Impact:
- Could not complete local docker deployment test from agent environment
- Symptoms:
docker: command not found
- Root cause:
- Runtime host lacks Docker CLI/daemon
- Immediate mitigation:
- Marked task as manual host validation
- Permanent fix:
- Keep as explicit manual step in TODO for host with Docker installed
- Verification:
- Manual non-docker dev run validated separately
- Prevention follow-up:
- Documented as environment capability mismatch in RUNBOOK.md
- Links:
- Commit:
1a4b984
- Commit: