docs(runbook): add periodic workflow automation and failure troubleshooting
This commit is contained in:
parent
9744a7ac23
commit
83e2b95501
20
README.md
20
README.md
|
|
@ -180,6 +180,26 @@ This project is built **agent-first** using OpenClaw autonomous agents:
|
|||
|
||||
Human oversight at milestone boundaries.
|
||||
|
||||
### Workflow Automation (local)
|
||||
|
||||
Use these scripts for periodic harness execution:
|
||||
|
||||
```bash
|
||||
# Resume from checkpoint (default mode)
|
||||
npm run workflow:run
|
||||
|
||||
# Force fresh run (ignore checkpoint progress)
|
||||
npm run workflow:run -- --mode restart
|
||||
|
||||
# Scheduler entrypoint: resume workflow + generate morning report
|
||||
npm run workflow:schedule
|
||||
|
||||
# Health check for automations/alerts (exit 0=healthy, 1=failed/blocked/unknown)
|
||||
npm run workflow:health-check
|
||||
```
|
||||
|
||||
For cron/systemd examples and failed/blocked troubleshooting, see [RUNBOOK.md](RUNBOOK.md#workflow-periodic-execution-cron--systemd).
|
||||
|
||||
---
|
||||
|
||||
## Contributing
|
||||
|
|
|
|||
120
RUNBOOK.md
120
RUNBOOK.md
|
|
@ -142,6 +142,126 @@ Spawn one explicit iteration with:
|
|||
|
||||
---
|
||||
|
||||
## Workflow Periodic Execution (cron + systemd)
|
||||
|
||||
All commands assume project root:
|
||||
`/home/paulh/.openclaw/workspace/projects/recipe-manager`
|
||||
|
||||
### Manual commands
|
||||
|
||||
```bash
|
||||
# Resume from checkpoint (default mode)
|
||||
npm run workflow:run
|
||||
|
||||
# Force restart from stage 1
|
||||
npm run workflow:run -- --mode restart
|
||||
|
||||
# Scheduled run entrypoint (resume + morning report)
|
||||
npm run workflow:schedule
|
||||
|
||||
# Health signal for automation (0=healthy, 1=failed/blocked/unknown)
|
||||
npm run workflow:health-check
|
||||
```
|
||||
|
||||
### Cron example
|
||||
|
||||
Run scheduler every 15 minutes, health check every 5 minutes:
|
||||
|
||||
```cron
|
||||
*/15 * * * * cd /home/paulh/.openclaw/workspace/projects/recipe-manager && /usr/bin/npm run workflow:schedule >> /home/paulh/.openclaw/workspace/projects/recipe-manager/status/workflow-schedule.log 2>&1
|
||||
*/5 * * * * cd /home/paulh/.openclaw/workspace/projects/recipe-manager && /usr/bin/npm run workflow:health-check >> /home/paulh/.openclaw/workspace/projects/recipe-manager/status/workflow-health.log 2>&1
|
||||
```
|
||||
|
||||
### systemd example
|
||||
|
||||
Create one-shot services and timers:
|
||||
|
||||
`/etc/systemd/system/recipe-workflow-schedule.service`
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Recipe Manager scheduled workflow run
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
WorkingDirectory=/home/paulh/.openclaw/workspace/projects/recipe-manager
|
||||
ExecStart=/usr/bin/npm run workflow:schedule
|
||||
```
|
||||
|
||||
`/etc/systemd/system/recipe-workflow-schedule.timer`
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Run Recipe Manager scheduled workflow every 15 minutes
|
||||
|
||||
[Timer]
|
||||
OnCalendar=*:0/15
|
||||
Persistent=true
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
```
|
||||
|
||||
`/etc/systemd/system/recipe-workflow-health.service`
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Recipe Manager workflow health check
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
WorkingDirectory=/home/paulh/.openclaw/workspace/projects/recipe-manager
|
||||
ExecStart=/usr/bin/npm run workflow:health-check
|
||||
```
|
||||
|
||||
`/etc/systemd/system/recipe-workflow-health.timer`
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Run Recipe Manager workflow health check every 5 minutes
|
||||
|
||||
[Timer]
|
||||
OnCalendar=*:0/5
|
||||
Persistent=true
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
```
|
||||
|
||||
Enable timers:
|
||||
|
||||
```bash
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now recipe-workflow-schedule.timer recipe-workflow-health.timer
|
||||
```
|
||||
|
||||
### Troubleshooting failed/blocked status
|
||||
|
||||
When `npm run workflow:health-check` returns exit code `1` with `{"status":"failed"}` or `{"status":"blocked"}`:
|
||||
|
||||
1. Check current workflow status payload:
|
||||
```bash
|
||||
cat status/workflow-status.json
|
||||
```
|
||||
2. Check recent progress log entries:
|
||||
```bash
|
||||
tail -n 50 status/workflow-progress.jsonl
|
||||
```
|
||||
3. Retry from checkpoint:
|
||||
```bash
|
||||
npm run workflow:run
|
||||
```
|
||||
4. If still blocked/failed, force a clean restart:
|
||||
```bash
|
||||
npm run workflow:run -- --mode restart
|
||||
```
|
||||
5. Re-run health check and confirm healthy output (`idle`, `running`, or `completed`):
|
||||
```bash
|
||||
npm run workflow:health-check
|
||||
```
|
||||
|
||||
If status file is missing or malformed, the health check prints `status_read_failed` and exits `1`; regenerate state with `npm run workflow:run -- --mode restart`.
|
||||
|
||||
---
|
||||
|
||||
## Completion Definition
|
||||
|
||||
A phase is complete when:
|
||||
|
|
|
|||
Loading…
Reference in New Issue