donkeycar-rl-autoresearch/agent/SESSION_HANDOFF.md

9.8 KiB
Raw Permalink Blame History

RL Donkeycar Session Handoff

Last updated: 2026-05-06 14:14 America/Toronto (exp27 RESTARTED — self-intersection fix deployed)

Autonomy Instruction

Continue the Donkeycar RL/sim work autonomously. Rebuild, sync, relaunch, run diagnostics, patch code, and restart experiments as needed. Keep going until you either have a verified fix and a running experiment, or a concrete blocker that truly requires the user. Do not stop just to ask for permission on ordinary reversible steps. Only pause for real risk of data loss, destructive actions, missing credentials/access, or major strategy tradeoffs that require a user decision.

If the user says only continue, interpret it using the instruction above.

Current Goal

Exp27 is RUNNING (PID 1094759, started 14:13 2026-05-06). Log: agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log

Road self-intersection fix is now deployed. Roads are guaranteed non-self-intersecting.

Road Self-Intersection: FIXED

PathManager.MakeRandomPath() now uses proper 2D segment-segment intersection math to detect and reject self-intersecting road candidates (up to 20 retries per regen).

Implementation in PathManager.cs:

  • GenerateCandidatePath() — extracts the generation loop into a callable helper
  • SegmentsIntersect2D() — cross-product parametric test in XZ plane
  • PathSelfIntersects() — checks all non-adjacent segment pairs (O(n²), ~4800 checks)
  • MakeRandomPath() — retry loop: generates candidate, rejects if self-intersecting

Unity rebuild completed and player DLL deployed at 14:13 2026-05-06.

IMPORTANT: DLL copy source — use the Builds output DLL, NOT Library/ScriptAssemblies:

# CORRECT (player DLL, strips editor refs):
rsync -av ".../sdsim/Builds/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll" \
  ".../DonkeySimWin/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll"

# WRONG (editor DLL, causes sim crash at scene load):
# Library/ScriptAssemblies/Assembly-CSharp.dll  ← DO NOT USE

FIRST THING TO DO IN A NEW SESSION

  1. Read this file
  2. Check current exp27 progress: grep "Checkpoint\|Eval\|NEW BEST" <log>
  3. Immediately arm a background monitor so the session stays alive:
LOG=/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log
N=$(grep -c "Checkpoint saved" $LOG)
TARGET=$((N + 5))
until [ $(grep -c "Checkpoint saved" $LOG) -ge $TARGET ]; do sleep 15; done \
  && grep "Eval\|NEW BEST\|Checkpoint" $LOG | tail -20

Run that as a background task immediately — don't wait. Then report current status to the user.

Resume Monitoring (new session)

Background task notifications don't survive session switches. In a new session:

# Check results so far
grep "Checkpoint\|Eval\|NEW BEST" \
  /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_133703_random_roads.log

# Arm fresh background monitor (replace N with current checkpoint count + 5)
LOG=/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_133703_random_roads.log
until [ $(grep -c "Checkpoint saved" $LOG) -ge N ]; do sleep 15; done && grep "Eval\|NEW BEST\|Checkpoint" $LOG

Exp27 Results So Far

Previous run (run_2026-05-06_133703, PID 1082126) — killed at 40k to deploy self-intersection fix:

Step Seed Reward Steps Status
Initial 81035 first road
10k 68546 39.0r 145 (may have self-intersected)
20k 35735 71.6r 230
30k 98061 39.2r 139
40k 2167 33.9r 148 (run killed — deploying fix)

Current run (run_2026-05-06_141328, PID 1094759) — self-intersection fix active:

Step Seed Reward Steps Status
Initial 89942 first road (no crossing)
10k 63790 250.5r 924 @924 — 6× better than pre-fix
20k 54863 275.1r 925 @925 — NEW BEST
30k 84765 377.3r 1325 @1325 — NEW BEST
40k 62695 33.8r 134 @134 — outlier (very hard road)
50k 51171 452.6r 1575 @1575 — NEW BEST
60k 13427 289.0r 1013 @1013
70k 99752 432.3r 1648 @1648 — NEW BEST steps
80k 40584 449.9r 1567 @1567
90k 23677 444.3r 1522 @1522
100k 11818 30.4r 160 @160 — outlier (hard road)
110k 15439 462.7r 1580 @1580
120k 79776 251.7r 893 @893
130k 51 273.5r 1029 @1029
140k 15985 386.8r 1260 @1260
150k 78623 50.5r 193 @193 — outlier
160k 68780 194.3r 753 @753 — low
170k 27669 375.2r 1371 @1371
180k 32153 45.6r 188 @188 — outlier
190k 23522 444.2r 1652 @1652 — NEW BEST (+4 steps)
200k 35712 200.8r 657 @657
210k 84828 53.5r 219 @219 — outlier
220k 66225 425.7r 1612 @1612
230k 41094 162.1r 581 @581 — low
240k 51566 438.2r 1613 @1613
250k 18319 19.8r 116 @116 — hard outlier
260k 99555 182.6r 603 @603 — low
270k 59896 59.4r 228 @228 — hard outlier
280k eval pending — TREND CONCERNING: regression since 190k

Early training — fresh weights, results will improve significantly by 50-100k.

Critical Fixes Applied This Session

1. regen_road was silently doing nothing (ROOT CAUSE FOUND)

TcpCarHandler.cs RegenRoad() coroutine only ran if TrainingManager != null. The generated_road scene has NO TrainingManager (it's for PID-based imitation learning data collection, not RL). So regen_road always hit the null check and did nothing. All of exp24/25/26/27-first-run trained on ONE road — the scene's initial road.

Fix: Added else branch to call RoadBuilder.DestroyRoad() + PathManager.InitCarPath() directly when TrainingManager is absent. Road regen now verified working by user observation.

2. MapOverlay minimap not refreshing after regen

RefreshPath() only triggered on node count change (always 100). Fixed to also check node[10] position — different seeds produce different positions.

3. PPO gradient update pause (BrakeOnUpdateCallback)

During model.learn(), Python stops calling env.step() for 3-8s (gradient updates). Last control command persists → car drifts. BrakeOnUpdateCallback._on_rollout_end() sends zero-control message before updates begin.

Exp27 Configuration

  • Script: agent/experiments/exp27_random_roads.py
  • Fresh weights (no warm start)
  • N_STEER=7, N_THROTTLE=3 → 21 discrete actions → throttle bins [0.2, 0.5, 1.0]
  • LR=0.0003, ent_coef=0.05, n_steps=1024
  • 500k total steps, checkpoint every 10k
  • regen_road with random seed each checkpoint (VERIFIED WORKING)
  • CTE termination: >2.0m for >0.5s
  • BrakeOnUpdateCallback: enabled
  • set_ai_text: pushes stats to sim overlay each checkpoint

New Unity Build (deployed this session)

Assembly-CSharp.dll rebuilt and rsynced. Changes:

  • regen_road fix: Direct RoadBuilder+PathManager fallback when no TrainingManager
  • MapOverlay.cs: Minimap refreshes on node position change (not just count)
  • Fixed steering display: Steer: -1.000 Thr:0.20 format
  • set_ai_text TCP message: Python pushes multi-line text to sim overlay
  • PathManager.cs self-intersection fix: retry loop with proper XZ segment math (2026-05-06 14:08)

DLL source for rsync — MUST use the Builds output, not Library/ScriptAssemblies:

  • Source: .../sdsim/Builds/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll
  • Destination: .../DonkeySimWin/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll
  • Library/ScriptAssemblies DLL contains editor references → crashes sim at scene load

Important Paths

  • Project: /home/paulh/projects/donkeycar-rl-autoresearch
  • Unity source: /mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim
  • Unity build output: .../sdsim/Builds/DonkeySimWin
  • Runtime sim: /mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin
  • Unity build log: C:\Users\Paul\AppData\Local\Temp\unity_rebuild.log

Workflow Pattern Documentation

  • Model-agnostic patterns: ~/docs/agent-workflow-patterns.md
  • Global Claude instructions (all sessions): ~/.claude/claude-instructions.md
  • This project's Claude instructions: agent/../CLAUDE.md

Useful Commands

# Monitor exp27 live
tail -f /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log

# Check all evals
grep "Checkpoint\|Eval\|NEW BEST" /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log

# Verify sim running
python3 -c "import socket; s=socket.socket(); s.settimeout(3); s.connect(('127.0.0.1',9091)); print('PORT 9091: OK'); s.close()"

# Check connections to sim
ss -tnp | grep 9091

# Unity rebuild command
"/mnt/c/Program Files/Unity/Hub/Editor/6000.4.4f1/Editor/Unity.exe" -quit -batchmode \
  -projectPath "C:/Users/Paul/Documents/projects/sdsandbox/sdsim" \
  -executeMethod PlayerBuilder.WinBuild \
  -logFile "C:/Users/Paul/AppData/Local/Temp/unity_rebuild.log"

# Kill sim
/mnt/c/Windows/System32/taskkill.exe /IM donkey_sim.exe /F

# Launch sim
"/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin/donkey_sim.exe" --port 9091 &

Cross-Model Eval Results (for reference)

Seeds tested: [1001, 2002, 3003, 4004, 5005, 6006, 7007, 8008, 9009, 1234] These results are from exp24/25/26 which ALL trained on one road — not comparable to exp27.

Rank Model Full eps Mean steps Mean reward
#1 exp26 9/10 1958s 381.2r
#2 exp25 9/10 1869s 356.3r
#3 exp24 9/10 1891s 347.5r