From 83e2b95501a58f1754b7554e8c2cc61a149d4359 Mon Sep 17 00:00:00 2001 From: Paul Huliganga Date: Thu, 26 Mar 2026 15:56:52 -0400 Subject: [PATCH] docs(runbook): add periodic workflow automation and failure troubleshooting --- README.md | 20 +++++++++ RUNBOOK.md | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 140 insertions(+) diff --git a/README.md b/README.md index 5d9b097..4405c38 100644 --- a/README.md +++ b/README.md @@ -180,6 +180,26 @@ This project is built **agent-first** using OpenClaw autonomous agents: Human oversight at milestone boundaries. +### Workflow Automation (local) + +Use these scripts for periodic harness execution: + +```bash +# Resume from checkpoint (default mode) +npm run workflow:run + +# Force fresh run (ignore checkpoint progress) +npm run workflow:run -- --mode restart + +# Scheduler entrypoint: resume workflow + generate morning report +npm run workflow:schedule + +# Health check for automations/alerts (exit 0=healthy, 1=failed/blocked/unknown) +npm run workflow:health-check +``` + +For cron/systemd examples and failed/blocked troubleshooting, see [RUNBOOK.md](RUNBOOK.md#workflow-periodic-execution-cron--systemd). + --- ## Contributing diff --git a/RUNBOOK.md b/RUNBOOK.md index 145d41c..822d467 100644 --- a/RUNBOOK.md +++ b/RUNBOOK.md @@ -142,6 +142,126 @@ Spawn one explicit iteration with: --- +## Workflow Periodic Execution (cron + systemd) + +All commands assume project root: +`/home/paulh/.openclaw/workspace/projects/recipe-manager` + +### Manual commands + +```bash +# Resume from checkpoint (default mode) +npm run workflow:run + +# Force restart from stage 1 +npm run workflow:run -- --mode restart + +# Scheduled run entrypoint (resume + morning report) +npm run workflow:schedule + +# Health signal for automation (0=healthy, 1=failed/blocked/unknown) +npm run workflow:health-check +``` + +### Cron example + +Run scheduler every 15 minutes, health check every 5 minutes: + +```cron +*/15 * * * * cd /home/paulh/.openclaw/workspace/projects/recipe-manager && /usr/bin/npm run workflow:schedule >> /home/paulh/.openclaw/workspace/projects/recipe-manager/status/workflow-schedule.log 2>&1 +*/5 * * * * cd /home/paulh/.openclaw/workspace/projects/recipe-manager && /usr/bin/npm run workflow:health-check >> /home/paulh/.openclaw/workspace/projects/recipe-manager/status/workflow-health.log 2>&1 +``` + +### systemd example + +Create one-shot services and timers: + +`/etc/systemd/system/recipe-workflow-schedule.service` +```ini +[Unit] +Description=Recipe Manager scheduled workflow run +After=network.target + +[Service] +Type=oneshot +WorkingDirectory=/home/paulh/.openclaw/workspace/projects/recipe-manager +ExecStart=/usr/bin/npm run workflow:schedule +``` + +`/etc/systemd/system/recipe-workflow-schedule.timer` +```ini +[Unit] +Description=Run Recipe Manager scheduled workflow every 15 minutes + +[Timer] +OnCalendar=*:0/15 +Persistent=true + +[Install] +WantedBy=timers.target +``` + +`/etc/systemd/system/recipe-workflow-health.service` +```ini +[Unit] +Description=Recipe Manager workflow health check +After=network.target + +[Service] +Type=oneshot +WorkingDirectory=/home/paulh/.openclaw/workspace/projects/recipe-manager +ExecStart=/usr/bin/npm run workflow:health-check +``` + +`/etc/systemd/system/recipe-workflow-health.timer` +```ini +[Unit] +Description=Run Recipe Manager workflow health check every 5 minutes + +[Timer] +OnCalendar=*:0/5 +Persistent=true + +[Install] +WantedBy=timers.target +``` + +Enable timers: + +```bash +sudo systemctl daemon-reload +sudo systemctl enable --now recipe-workflow-schedule.timer recipe-workflow-health.timer +``` + +### Troubleshooting failed/blocked status + +When `npm run workflow:health-check` returns exit code `1` with `{"status":"failed"}` or `{"status":"blocked"}`: + +1. Check current workflow status payload: + ```bash + cat status/workflow-status.json + ``` +2. Check recent progress log entries: + ```bash + tail -n 50 status/workflow-progress.jsonl + ``` +3. Retry from checkpoint: + ```bash + npm run workflow:run + ``` +4. If still blocked/failed, force a clean restart: + ```bash + npm run workflow:run -- --mode restart + ``` +5. Re-run health check and confirm healthy output (`idle`, `running`, or `completed`): + ```bash + npm run workflow:health-check + ``` + +If status file is missing or malformed, the health check prints `status_read_failed` and exits `1`; regenerate state with `npm run workflow:run -- --mode restart`. + +--- + ## Completion Definition A phase is complete when: