From 3c2b67877179136f342dd0a773086194901e24a3 Mon Sep 17 00:00:00 2001
From: Paul Huliganga <paje0101@gmail.com>
Date: Thu, 14 May 2026 15:32:04 -0400
Subject: [PATCH] chore: add CLAUDE.md project instructions + exclude .chat/
 from git

Adds CLAUDE.md with session startup checklist, autonomy instructions,
and long-running task workflow patterns (condition-based and time-based
wakeups). Excludes .chat/ chat history from git.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 .gitignore |   3 ++
 CLAUDE.md  | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+)
 create mode 100644 CLAUDE.md
diff --git a/.gitignore b/.gitignore
index ce2946a..71f948b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -18,3 +18,6 @@ __pycache__/
 .vscode/
 .idea/
 agent/models/**/*.zip
+
+# Chat history
+.chat/
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..e95ee54
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,112 @@
+# Claude Code — Donkeycar RL Project Instructions
+
+## Session Startup — Do This First
+
+At the start of every session:
+
+1. Read `agent/SESSION_HANDOFF.md`
+2. Check if an experiment is running: `ss -tnp | grep 9091`
+3. If running, **immediately arm a background monitor** on the experiment log before anything else
+4. Report current status to the user
+
+Do not wait for the user to ask. Arming the monitor is the first action.
+
+---
+
+## Autonomy Instruction
+
+Continue the Donkeycar RL/sim work autonomously. Rebuild, sync, relaunch, run
+diagnostics, patch code, and restart experiments as needed. Keep going until you
+either have a verified fix and a running experiment, or a concrete blocker that
+truly requires the user. Only pause for: risk of data loss, destructive actions,
+missing credentials, or major strategy tradeoffs.
+
+If the user says only `continue`, use the instruction above.
+
+---
+
+## Long-Running Task Workflow
+
+A common failure mode: model starts a long-running task, returns to the prompt,
+and waits — defeating the purpose. Claude Code has two mechanisms to avoid this.
+
+### 1. Condition-based wakeup (background Bash task)
+
+Use when you want to be woken when something happens (log line appears, file
+changes, process exits).
+
+```python
+# Pattern: until <condition>; do sleep N; done && <show results>
+Bash(
+    command="until grep -q 'Checkpoint saved' /path/to/run.log; do sleep 15; done "
+            "&& tail -20 /path/to/run.log",
+    run_in_background=True
+)
+```
+
+When the shell command exits, the runtime delivers a `<task-notification>` into
+the conversation. This wakes Claude up automatically — no polling needed on the
+model side. The model then reads the output and continues work.
+
+**Key rules:**
+- Use `until <check>; do sleep N; done` — never `sleep 120 && check` (blocked)
+- The notification arrives as a message in the conversation context
+- Multiple background tasks can run in parallel and each fires independently
+
+### 2. Time-based wakeup (ScheduleWakeup tool)
+
+Use when you want to be woken after a fixed delay (e.g. "check back in 20 min").
+
+```python
+ScheduleWakeup(
+    delaySeconds=1200,   # 20 minutes
+    reason="checking 100k checkpoint results",
+    prompt="<<autonomous-loop-dynamic>>"  # or the original /loop prompt
+)
+```
+
+The runtime fires the wakeup after the delay, re-entering the conversation so
+the model can continue. Use this for idle waits (waiting for a build, a slow
+process, etc.) when there's no clear log-file signal to watch.
+
+**Cache timing guidance:**
+- Under 270s: prompt cache stays warm (cheap)
+- Over 300s: cache miss on wakeup (more expensive but fine for long waits)
+- Default idle: 1200–1800s (20–30 min)
+
+### 3. Combining both
+
+For long experiments, use condition-based tasks for each checkpoint, and
+ScheduleWakeup as a fallback heartbeat:
+
+```python
+# Fire when checkpoint N arrives
+Bash(command="until [ $(grep -c 'Checkpoint' log) -ge 5 ]; do sleep 15; done && tail log",
+     run_in_background=True)
+
+# Also wake in 30 min regardless, to check for errors
+ScheduleWakeup(delaySeconds=1800, reason="exp27 progress check", prompt="continue")
+```
+
+---
+
+## Experiment Monitoring Commands
+
+```bash
+# Watch live training log
+tail -f agent/models/exp27-random-roads/run_*.log
+
+# Check all checkpoints and evals
+grep "Checkpoint\|Eval\|NEW BEST" agent/models/exp27-random-roads/run_*.log
+
+# Verify sim is up
+python3 -c "import socket; s=socket.socket(); s.settimeout(3); s.connect(('127.0.0.1',9091)); print('OK'); s.close()"
+
+# Check what's connected to sim
+ss -tnp | grep 9091
+```
+
+## Session Handoff
+
+Full experiment history, current state, and important paths:
+`agent/SESSION_HANDOFF.md`