9.8 KiB

Raw Blame History

RL Donkeycar Session Handoff

Last updated: 2026-05-06 14:14 America/Toronto (exp27 RESTARTED — self-intersection fix deployed)

Autonomy Instruction

Continue the Donkeycar RL/sim work autonomously. Rebuild, sync, relaunch, run diagnostics, patch code, and restart experiments as needed. Keep going until you either have a verified fix and a running experiment, or a concrete blocker that truly requires the user. Do not stop just to ask for permission on ordinary reversible steps. Only pause for real risk of data loss, destructive actions, missing credentials/access, or major strategy tradeoffs that require a user decision.

If the user says only continue, interpret it using the instruction above.

Current Goal

Exp27 is RUNNING (PID 1094759, started 14:13 2026-05-06). Log: agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log

Road self-intersection fix is now deployed. Roads are guaranteed non-self-intersecting.

Road Self-Intersection: FIXED

PathManager.MakeRandomPath() now uses proper 2D segment-segment intersection math to detect and reject self-intersecting road candidates (up to 20 retries per regen).

Implementation in PathManager.cs:

GenerateCandidatePath() — extracts the generation loop into a callable helper
SegmentsIntersect2D() — cross-product parametric test in XZ plane
PathSelfIntersects() — checks all non-adjacent segment pairs (O(n²), ~4800 checks)
MakeRandomPath() — retry loop: generates candidate, rejects if self-intersecting

Unity rebuild completed and player DLL deployed at 14:13 2026-05-06.

IMPORTANT: DLL copy source — use the Builds output DLL, NOT Library/ScriptAssemblies:

# CORRECT (player DLL, strips editor refs):
rsync -av ".../sdsim/Builds/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll" \
  ".../DonkeySimWin/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll"

# WRONG (editor DLL, causes sim crash at scene load):
# Library/ScriptAssemblies/Assembly-CSharp.dll  ← DO NOT USE

FIRST THING TO DO IN A NEW SESSION

Read this file
Check current exp27 progress: grep "Checkpoint\|Eval\|NEW BEST" <log>
Immediately arm a background monitor so the session stays alive:

LOG=/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log
N=$(grep -c "Checkpoint saved" $LOG)
TARGET=$((N + 5))
until [ $(grep -c "Checkpoint saved" $LOG) -ge $TARGET ]; do sleep 15; done \
  && grep "Eval\|NEW BEST\|Checkpoint" $LOG | tail -20

Run that as a background task immediately — don't wait. Then report current status to the user.

Resume Monitoring (new session)

Background task notifications don't survive session switches. In a new session:

# Check results so far
grep "Checkpoint\|Eval\|NEW BEST" \
  /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_133703_random_roads.log

# Arm fresh background monitor (replace N with current checkpoint count + 5)
LOG=/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_133703_random_roads.log
until [ $(grep -c "Checkpoint saved" $LOG) -ge N ]; do sleep 15; done && grep "Eval\|NEW BEST\|Checkpoint" $LOG

Exp27 Results So Far

Previous run (run_2026-05-06_133703, PID 1082126) — killed at 40k to deploy self-intersection fix:

Step	Seed	Reward	Steps	Status
Initial	81035	—	—	first road
10k	68546	39.0r	145	❌ (may have self-intersected)
20k	35735	71.6r	230	❌
30k	98061	39.2r	139	❌
40k	2167	33.9r	148	❌ (run killed — deploying fix)

Current run (run_2026-05-06_141328, PID 1094759) — self-intersection fix active:

Step	Seed	Reward	Steps	Status
Initial	89942	—	—	first road (no crossing)
10k	63790	250.5r	924	❌@924 — 6× better than pre-fix
20k	54863	275.1r	925	❌@925 — NEW BEST
30k	84765	377.3r	1325	❌@1325 — NEW BEST
40k	62695	33.8r	134	❌@134 — outlier (very hard road)
50k	51171	452.6r	1575	❌@1575 — NEW BEST
60k	13427	289.0r	1013	❌@1013
70k	99752	432.3r	1648	❌@1648 — NEW BEST steps
80k	40584	449.9r	1567	❌@1567
90k	23677	444.3r	1522	❌@1522
100k	11818	30.4r	160	❌@160 — outlier (hard road)
110k	15439	462.7r	1580	❌@1580
120k	79776	251.7r	893	❌@893
130k	51	273.5r	1029	❌@1029
140k	15985	386.8r	1260	❌@1260
150k	78623	50.5r	193	❌@193 — outlier
160k	68780	194.3r	753	❌@753 — low
170k	27669	375.2r	1371	❌@1371
180k	32153	45.6r	188	❌@188 — outlier
190k	23522	444.2r	1652	❌@1652 — NEW BEST (+4 steps)
200k	35712	200.8r	657	❌@657
210k	84828	53.5r	219	❌@219 — outlier
220k	66225	425.7r	1612	❌@1612
230k	41094	162.1r	581	❌@581 — low
240k	51566	438.2r	1613	❌@1613
250k	18319	19.8r	116	❌@116 — hard outlier
260k	99555	182.6r	603	❌@603 — low
270k	59896	59.4r	228	❌@228 — hard outlier
280k	—	—	—	eval pending — TREND CONCERNING: regression since 190k

Early training — fresh weights, results will improve significantly by 50-100k.

Critical Fixes Applied This Session

1. regen_road was silently doing nothing (ROOT CAUSE FOUND)

TcpCarHandler.cs RegenRoad() coroutine only ran if TrainingManager != null. The generated_road scene has NO TrainingManager (it's for PID-based imitation learning data collection, not RL). So regen_road always hit the null check and did nothing. All of exp24/25/26/27-first-run trained on ONE road — the scene's initial road.

Fix: Added else branch to call RoadBuilder.DestroyRoad() + PathManager.InitCarPath() directly when TrainingManager is absent. Road regen now verified working by user observation.

2. MapOverlay minimap not refreshing after regen

RefreshPath() only triggered on node count change (always 100). Fixed to also check node[10] position — different seeds produce different positions.

3. PPO gradient update pause (BrakeOnUpdateCallback)

During model.learn(), Python stops calling env.step() for 3-8s (gradient updates). Last control command persists → car drifts. BrakeOnUpdateCallback._on_rollout_end() sends zero-control message before updates begin.

Exp27 Configuration

Script: agent/experiments/exp27_random_roads.py
Fresh weights (no warm start)
N_STEER=7, N_THROTTLE=3 → 21 discrete actions → throttle bins [0.2, 0.5, 1.0]
LR=0.0003, ent_coef=0.05, n_steps=1024
500k total steps, checkpoint every 10k
regen_road with random seed each checkpoint (VERIFIED WORKING)
CTE termination: >2.0m for >0.5s
BrakeOnUpdateCallback: enabled
set_ai_text: pushes stats to sim overlay each checkpoint

New Unity Build (deployed this session)

Assembly-CSharp.dll rebuilt and rsynced. Changes:

regen_road fix: Direct RoadBuilder+PathManager fallback when no TrainingManager
MapOverlay.cs: Minimap refreshes on node position change (not just count)
Fixed steering display: Steer: -1.000 Thr:0.20 format
set_ai_text TCP message: Python pushes multi-line text to sim overlay
PathManager.cs self-intersection fix: retry loop with proper XZ segment math (2026-05-06 14:08)

DLL source for rsync — MUST use the Builds output, not Library/ScriptAssemblies:

Source: .../sdsim/Builds/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll
Destination: .../DonkeySimWin/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll
Library/ScriptAssemblies DLL contains editor references → crashes sim at scene load

Important Paths

Project: /home/paulh/projects/donkeycar-rl-autoresearch
Unity source: /mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim
Unity build output: .../sdsim/Builds/DonkeySimWin
Runtime sim: /mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin
Unity build log: C:\Users\Paul\AppData\Local\Temp\unity_rebuild.log

Workflow Pattern Documentation

Model-agnostic patterns: ~/docs/agent-workflow-patterns.md
Global Claude instructions (all sessions): ~/.claude/claude-instructions.md
This project's Claude instructions: agent/../CLAUDE.md

Useful Commands

# Monitor exp27 live
tail -f /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log

# Check all evals
grep "Checkpoint\|Eval\|NEW BEST" /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log

# Verify sim running
python3 -c "import socket; s=socket.socket(); s.settimeout(3); s.connect(('127.0.0.1',9091)); print('PORT 9091: OK'); s.close()"

# Check connections to sim
ss -tnp | grep 9091

# Unity rebuild command
"/mnt/c/Program Files/Unity/Hub/Editor/6000.4.4f1/Editor/Unity.exe" -quit -batchmode \
  -projectPath "C:/Users/Paul/Documents/projects/sdsandbox/sdsim" \
  -executeMethod PlayerBuilder.WinBuild \
  -logFile "C:/Users/Paul/AppData/Local/Temp/unity_rebuild.log"

# Kill sim
/mnt/c/Windows/System32/taskkill.exe /IM donkey_sim.exe /F

# Launch sim
"/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin/donkey_sim.exe" --port 9091 &

Cross-Model Eval Results (for reference)

Seeds tested: [1001, 2002, 3003, 4004, 5005, 6006, 7007, 8008, 9009, 1234] These results are from exp24/25/26 which ALL trained on one road — not comparable to exp27.

Rank	Model	Full eps	Mean steps	Mean reward
#1	exp26	9/10	1958s	381.2r
#2	exp25	9/10	1869s	356.3r
#3	exp24	9/10	1891s	347.5r

9.8 KiB Raw Blame History Unescape Escape