9.8 KiB
RL Donkeycar Session Handoff
Last updated: 2026-05-06 14:14 America/Toronto (exp27 RESTARTED — self-intersection fix deployed)
Autonomy Instruction
Continue the Donkeycar RL/sim work autonomously. Rebuild, sync, relaunch, run diagnostics, patch code, and restart experiments as needed. Keep going until you either have a verified fix and a running experiment, or a concrete blocker that truly requires the user. Do not stop just to ask for permission on ordinary reversible steps. Only pause for real risk of data loss, destructive actions, missing credentials/access, or major strategy tradeoffs that require a user decision.
If the user says only continue, interpret it using the instruction above.
Current Goal
Exp27 is RUNNING (PID 1094759, started 14:13 2026-05-06).
Log: agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log
Road self-intersection fix is now deployed. Roads are guaranteed non-self-intersecting.
Road Self-Intersection: FIXED
PathManager.MakeRandomPath() now uses proper 2D segment-segment intersection math
to detect and reject self-intersecting road candidates (up to 20 retries per regen).
Implementation in PathManager.cs:
GenerateCandidatePath()— extracts the generation loop into a callable helperSegmentsIntersect2D()— cross-product parametric test in XZ planePathSelfIntersects()— checks all non-adjacent segment pairs (O(n²), ~4800 checks)MakeRandomPath()— retry loop: generates candidate, rejects if self-intersecting
Unity rebuild completed and player DLL deployed at 14:13 2026-05-06.
IMPORTANT: DLL copy source — use the Builds output DLL, NOT Library/ScriptAssemblies:
# CORRECT (player DLL, strips editor refs):
rsync -av ".../sdsim/Builds/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll" \
".../DonkeySimWin/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll"
# WRONG (editor DLL, causes sim crash at scene load):
# Library/ScriptAssemblies/Assembly-CSharp.dll ← DO NOT USE
FIRST THING TO DO IN A NEW SESSION
- Read this file
- Check current exp27 progress:
grep "Checkpoint\|Eval\|NEW BEST" <log> - Immediately arm a background monitor so the session stays alive:
LOG=/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log
N=$(grep -c "Checkpoint saved" $LOG)
TARGET=$((N + 5))
until [ $(grep -c "Checkpoint saved" $LOG) -ge $TARGET ]; do sleep 15; done \
&& grep "Eval\|NEW BEST\|Checkpoint" $LOG | tail -20
Run that as a background task immediately — don't wait. Then report current status to the user.
Resume Monitoring (new session)
Background task notifications don't survive session switches. In a new session:
# Check results so far
grep "Checkpoint\|Eval\|NEW BEST" \
/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_133703_random_roads.log
# Arm fresh background monitor (replace N with current checkpoint count + 5)
LOG=/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_133703_random_roads.log
until [ $(grep -c "Checkpoint saved" $LOG) -ge N ]; do sleep 15; done && grep "Eval\|NEW BEST\|Checkpoint" $LOG
Exp27 Results So Far
Previous run (run_2026-05-06_133703, PID 1082126) — killed at 40k to deploy self-intersection fix:
| Step | Seed | Reward | Steps | Status |
|---|---|---|---|---|
| Initial | 81035 | — | — | first road |
| 10k | 68546 | 39.0r | 145 | ❌ (may have self-intersected) |
| 20k | 35735 | 71.6r | 230 | ❌ |
| 30k | 98061 | 39.2r | 139 | ❌ |
| 40k | 2167 | 33.9r | 148 | ❌ (run killed — deploying fix) |
Current run (run_2026-05-06_141328, PID 1094759) — self-intersection fix active:
| Step | Seed | Reward | Steps | Status |
|---|---|---|---|---|
| Initial | 89942 | — | — | first road (no crossing) |
| 10k | 63790 | 250.5r | 924 | ❌@924 — 6× better than pre-fix |
| 20k | 54863 | 275.1r | 925 | ❌@925 — NEW BEST |
| 30k | 84765 | 377.3r | 1325 | ❌@1325 — NEW BEST |
| 40k | 62695 | 33.8r | 134 | ❌@134 — outlier (very hard road) |
| 50k | 51171 | 452.6r | 1575 | ❌@1575 — NEW BEST |
| 60k | 13427 | 289.0r | 1013 | ❌@1013 |
| 70k | 99752 | 432.3r | 1648 | ❌@1648 — NEW BEST steps |
| 80k | 40584 | 449.9r | 1567 | ❌@1567 |
| 90k | 23677 | 444.3r | 1522 | ❌@1522 |
| 100k | 11818 | 30.4r | 160 | ❌@160 — outlier (hard road) |
| 110k | 15439 | 462.7r | 1580 | ❌@1580 |
| 120k | 79776 | 251.7r | 893 | ❌@893 |
| 130k | 51 | 273.5r | 1029 | ❌@1029 |
| 140k | 15985 | 386.8r | 1260 | ❌@1260 |
| 150k | 78623 | 50.5r | 193 | ❌@193 — outlier |
| 160k | 68780 | 194.3r | 753 | ❌@753 — low |
| 170k | 27669 | 375.2r | 1371 | ❌@1371 |
| 180k | 32153 | 45.6r | 188 | ❌@188 — outlier |
| 190k | 23522 | 444.2r | 1652 | ❌@1652 — NEW BEST (+4 steps) |
| 200k | 35712 | 200.8r | 657 | ❌@657 |
| 210k | 84828 | 53.5r | 219 | ❌@219 — outlier |
| 220k | 66225 | 425.7r | 1612 | ❌@1612 |
| 230k | 41094 | 162.1r | 581 | ❌@581 — low |
| 240k | 51566 | 438.2r | 1613 | ❌@1613 |
| 250k | 18319 | 19.8r | 116 | ❌@116 — hard outlier |
| 260k | 99555 | 182.6r | 603 | ❌@603 — low |
| 270k | 59896 | 59.4r | 228 | ❌@228 — hard outlier |
| 280k | — | — | — | eval pending — TREND CONCERNING: regression since 190k |
Early training — fresh weights, results will improve significantly by 50-100k.
Critical Fixes Applied This Session
1. regen_road was silently doing nothing (ROOT CAUSE FOUND)
TcpCarHandler.cs RegenRoad() coroutine only ran if TrainingManager != null.
The generated_road scene has NO TrainingManager (it's for PID-based imitation learning
data collection, not RL). So regen_road always hit the null check and did nothing.
All of exp24/25/26/27-first-run trained on ONE road — the scene's initial road.
Fix: Added else branch to call RoadBuilder.DestroyRoad() + PathManager.InitCarPath()
directly when TrainingManager is absent. Road regen now verified working by user observation.
2. MapOverlay minimap not refreshing after regen
RefreshPath() only triggered on node count change (always 100). Fixed to also check
node[10] position — different seeds produce different positions.
3. PPO gradient update pause (BrakeOnUpdateCallback)
During model.learn(), Python stops calling env.step() for 3-8s (gradient updates).
Last control command persists → car drifts. BrakeOnUpdateCallback._on_rollout_end()
sends zero-control message before updates begin.
Exp27 Configuration
- Script:
agent/experiments/exp27_random_roads.py - Fresh weights (no warm start)
- N_STEER=7, N_THROTTLE=3 → 21 discrete actions → throttle bins [0.2, 0.5, 1.0]
- LR=0.0003, ent_coef=0.05, n_steps=1024
- 500k total steps, checkpoint every 10k
- regen_road with random seed each checkpoint (VERIFIED WORKING)
- CTE termination: >2.0m for >0.5s
- BrakeOnUpdateCallback: enabled
- set_ai_text: pushes stats to sim overlay each checkpoint
New Unity Build (deployed this session)
Assembly-CSharp.dll rebuilt and rsynced. Changes:
- regen_road fix: Direct RoadBuilder+PathManager fallback when no TrainingManager
- MapOverlay.cs: Minimap refreshes on node position change (not just count)
- Fixed steering display:
Steer: -1.000 Thr:0.20format - set_ai_text TCP message: Python pushes multi-line text to sim overlay
- PathManager.cs self-intersection fix: retry loop with proper XZ segment math (2026-05-06 14:08)
DLL source for rsync — MUST use the Builds output, not Library/ScriptAssemblies:
- Source:
.../sdsim/Builds/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll - Destination:
.../DonkeySimWin/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll - Library/ScriptAssemblies DLL contains editor references → crashes sim at scene load
Important Paths
- Project:
/home/paulh/projects/donkeycar-rl-autoresearch - Unity source:
/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim - Unity build output:
.../sdsim/Builds/DonkeySimWin - Runtime sim:
/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin - Unity build log:
C:\Users\Paul\AppData\Local\Temp\unity_rebuild.log
Workflow Pattern Documentation
- Model-agnostic patterns:
~/docs/agent-workflow-patterns.md - Global Claude instructions (all sessions):
~/.claude/claude-instructions.md - This project's Claude instructions:
agent/../CLAUDE.md
Useful Commands
# Monitor exp27 live
tail -f /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log
# Check all evals
grep "Checkpoint\|Eval\|NEW BEST" /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log
# Verify sim running
python3 -c "import socket; s=socket.socket(); s.settimeout(3); s.connect(('127.0.0.1',9091)); print('PORT 9091: OK'); s.close()"
# Check connections to sim
ss -tnp | grep 9091
# Unity rebuild command
"/mnt/c/Program Files/Unity/Hub/Editor/6000.4.4f1/Editor/Unity.exe" -quit -batchmode \
-projectPath "C:/Users/Paul/Documents/projects/sdsandbox/sdsim" \
-executeMethod PlayerBuilder.WinBuild \
-logFile "C:/Users/Paul/AppData/Local/Temp/unity_rebuild.log"
# Kill sim
/mnt/c/Windows/System32/taskkill.exe /IM donkey_sim.exe /F
# Launch sim
"/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin/donkey_sim.exe" --port 9091 &
Cross-Model Eval Results (for reference)
Seeds tested: [1001, 2002, 3003, 4004, 5005, 6006, 7007, 8008, 9009, 1234] These results are from exp24/25/26 which ALL trained on one road — not comparable to exp27.
| Rank | Model | Full eps | Mean steps | Mean reward |
|---|---|---|---|---|
| #1 | exp26 | 9/10 | 1958s | 381.2r |
| #2 | exp25 | 9/10 | 1869s | 356.3r |
| #3 | exp24 | 9/10 | 1891s | 347.5r |