donkeycar-rl-autoresearch/CLAUDE.md

3.6 KiB
Raw Permalink Blame History

Claude Code — Donkeycar RL Project Instructions

Session Startup — Do This First

At the start of every session:

  1. Read agent/SESSION_HANDOFF.md
  2. Check if an experiment is running: ss -tnp | grep 9091
  3. If running, immediately arm a background monitor on the experiment log before anything else
  4. Report current status to the user

Do not wait for the user to ask. Arming the monitor is the first action.


Autonomy Instruction

Continue the Donkeycar RL/sim work autonomously. Rebuild, sync, relaunch, run diagnostics, patch code, and restart experiments as needed. Keep going until you either have a verified fix and a running experiment, or a concrete blocker that truly requires the user. Only pause for: risk of data loss, destructive actions, missing credentials, or major strategy tradeoffs.

If the user says only continue, use the instruction above.


Long-Running Task Workflow

A common failure mode: model starts a long-running task, returns to the prompt, and waits — defeating the purpose. Claude Code has two mechanisms to avoid this.

1. Condition-based wakeup (background Bash task)

Use when you want to be woken when something happens (log line appears, file changes, process exits).

# Pattern: until <condition>; do sleep N; done && <show results>
Bash(
    command="until grep -q 'Checkpoint saved' /path/to/run.log; do sleep 15; done "
            "&& tail -20 /path/to/run.log",
    run_in_background=True
)

When the shell command exits, the runtime delivers a <task-notification> into the conversation. This wakes Claude up automatically — no polling needed on the model side. The model then reads the output and continues work.

Key rules:

  • Use until <check>; do sleep N; done — never sleep 120 && check (blocked)
  • The notification arrives as a message in the conversation context
  • Multiple background tasks can run in parallel and each fires independently

2. Time-based wakeup (ScheduleWakeup tool)

Use when you want to be woken after a fixed delay (e.g. "check back in 20 min").

ScheduleWakeup(
    delaySeconds=1200,   # 20 minutes
    reason="checking 100k checkpoint results",
    prompt="<<autonomous-loop-dynamic>>"  # or the original /loop prompt
)

The runtime fires the wakeup after the delay, re-entering the conversation so the model can continue. Use this for idle waits (waiting for a build, a slow process, etc.) when there's no clear log-file signal to watch.

Cache timing guidance:

  • Under 270s: prompt cache stays warm (cheap)
  • Over 300s: cache miss on wakeup (more expensive but fine for long waits)
  • Default idle: 12001800s (2030 min)

3. Combining both

For long experiments, use condition-based tasks for each checkpoint, and ScheduleWakeup as a fallback heartbeat:

# Fire when checkpoint N arrives
Bash(command="until [ $(grep -c 'Checkpoint' log) -ge 5 ]; do sleep 15; done && tail log",
     run_in_background=True)

# Also wake in 30 min regardless, to check for errors
ScheduleWakeup(delaySeconds=1800, reason="exp27 progress check", prompt="continue")

Experiment Monitoring Commands

# Watch live training log
tail -f agent/models/exp27-random-roads/run_*.log

# Check all checkpoints and evals
grep "Checkpoint\|Eval\|NEW BEST" agent/models/exp27-random-roads/run_*.log

# Verify sim is up
python3 -c "import socket; s=socket.socket(); s.settimeout(3); s.connect(('127.0.0.1',9091)); print('OK'); s.close()"

# Check what's connected to sim
ss -tnp | grep 9091

Session Handoff

Full experiment history, current state, and important paths: agent/SESSION_HANDOFF.md