From 3c2b67877179136f342dd0a773086194901e24a3 Mon Sep 17 00:00:00 2001 From: Paul Huliganga Date: Thu, 14 May 2026 15:32:04 -0400 Subject: [PATCH] chore: add CLAUDE.md project instructions + exclude .chat/ from git Adds CLAUDE.md with session startup checklist, autonomy instructions, and long-running task workflow patterns (condition-based and time-based wakeups). Excludes .chat/ chat history from git. Co-Authored-By: Claude Sonnet 4.6 --- .gitignore | 3 ++ CLAUDE.md | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 115 insertions(+) create mode 100644 CLAUDE.md diff --git a/.gitignore b/.gitignore index ce2946a..71f948b 100644 --- a/.gitignore +++ b/.gitignore @@ -18,3 +18,6 @@ __pycache__/ .vscode/ .idea/ agent/models/**/*.zip + +# Chat history +.chat/ diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..e95ee54 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,112 @@ +# Claude Code — Donkeycar RL Project Instructions + +## Session Startup — Do This First + +At the start of every session: + +1. Read `agent/SESSION_HANDOFF.md` +2. Check if an experiment is running: `ss -tnp | grep 9091` +3. If running, **immediately arm a background monitor** on the experiment log before anything else +4. Report current status to the user + +Do not wait for the user to ask. Arming the monitor is the first action. + +--- + +## Autonomy Instruction + +Continue the Donkeycar RL/sim work autonomously. Rebuild, sync, relaunch, run +diagnostics, patch code, and restart experiments as needed. Keep going until you +either have a verified fix and a running experiment, or a concrete blocker that +truly requires the user. Only pause for: risk of data loss, destructive actions, +missing credentials, or major strategy tradeoffs. + +If the user says only `continue`, use the instruction above. + +--- + +## Long-Running Task Workflow + +A common failure mode: model starts a long-running task, returns to the prompt, +and waits — defeating the purpose. Claude Code has two mechanisms to avoid this. + +### 1. Condition-based wakeup (background Bash task) + +Use when you want to be woken when something happens (log line appears, file +changes, process exits). + +```python +# Pattern: until ; do sleep N; done && +Bash( + command="until grep -q 'Checkpoint saved' /path/to/run.log; do sleep 15; done " + "&& tail -20 /path/to/run.log", + run_in_background=True +) +``` + +When the shell command exits, the runtime delivers a `` into +the conversation. This wakes Claude up automatically — no polling needed on the +model side. The model then reads the output and continues work. + +**Key rules:** +- Use `until ; do sleep N; done` — never `sleep 120 && check` (blocked) +- The notification arrives as a message in the conversation context +- Multiple background tasks can run in parallel and each fires independently + +### 2. Time-based wakeup (ScheduleWakeup tool) + +Use when you want to be woken after a fixed delay (e.g. "check back in 20 min"). + +```python +ScheduleWakeup( + delaySeconds=1200, # 20 minutes + reason="checking 100k checkpoint results", + prompt="<>" # or the original /loop prompt +) +``` + +The runtime fires the wakeup after the delay, re-entering the conversation so +the model can continue. Use this for idle waits (waiting for a build, a slow +process, etc.) when there's no clear log-file signal to watch. + +**Cache timing guidance:** +- Under 270s: prompt cache stays warm (cheap) +- Over 300s: cache miss on wakeup (more expensive but fine for long waits) +- Default idle: 1200–1800s (20–30 min) + +### 3. Combining both + +For long experiments, use condition-based tasks for each checkpoint, and +ScheduleWakeup as a fallback heartbeat: + +```python +# Fire when checkpoint N arrives +Bash(command="until [ $(grep -c 'Checkpoint' log) -ge 5 ]; do sleep 15; done && tail log", + run_in_background=True) + +# Also wake in 30 min regardless, to check for errors +ScheduleWakeup(delaySeconds=1800, reason="exp27 progress check", prompt="continue") +``` + +--- + +## Experiment Monitoring Commands + +```bash +# Watch live training log +tail -f agent/models/exp27-random-roads/run_*.log + +# Check all checkpoints and evals +grep "Checkpoint\|Eval\|NEW BEST" agent/models/exp27-random-roads/run_*.log + +# Verify sim is up +python3 -c "import socket; s=socket.socket(); s.settimeout(3); s.connect(('127.0.0.1',9091)); print('OK'); s.close()" + +# Check what's connected to sim +ss -tnp | grep 9091 +``` + +## Session Handoff + +Full experiment history, current state, and important paths: +`agent/SESSION_HANDOFF.md`