donkeycar-rl-autoresearch/agent/SESSION_HANDOFF.md

220 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# RL Donkeycar Session Handoff
Last updated: 2026-05-06 14:14 America/Toronto (exp27 RESTARTED — self-intersection fix deployed)
## Autonomy Instruction
`Continue the Donkeycar RL/sim work autonomously. Rebuild, sync, relaunch, run diagnostics, patch code, and restart experiments as needed. Keep going until you either have a verified fix and a running experiment, or a concrete blocker that truly requires the user. Do not stop just to ask for permission on ordinary reversible steps. Only pause for real risk of data loss, destructive actions, missing credentials/access, or major strategy tradeoffs that require a user decision.`
If the user says only `continue`, interpret it using the instruction above.
## Current Goal
**Exp27 is RUNNING** (PID 1094759, started 14:13 2026-05-06).
Log: `agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log`
Road self-intersection fix is now deployed. Roads are guaranteed non-self-intersecting.
## Road Self-Intersection: FIXED
`PathManager.MakeRandomPath()` now uses proper 2D segment-segment intersection math
to detect and reject self-intersecting road candidates (up to 20 retries per regen).
**Implementation in** `PathManager.cs`:
- `GenerateCandidatePath()` — extracts the generation loop into a callable helper
- `SegmentsIntersect2D()` — cross-product parametric test in XZ plane
- `PathSelfIntersects()` — checks all non-adjacent segment pairs (O(n²), ~4800 checks)
- `MakeRandomPath()` — retry loop: generates candidate, rejects if self-intersecting
Unity rebuild completed and player DLL deployed at 14:13 2026-05-06.
**IMPORTANT: DLL copy source** — use the **Builds output** DLL, NOT Library/ScriptAssemblies:
```bash
# CORRECT (player DLL, strips editor refs):
rsync -av ".../sdsim/Builds/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll" \
".../DonkeySimWin/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll"
# WRONG (editor DLL, causes sim crash at scene load):
# Library/ScriptAssemblies/Assembly-CSharp.dll ← DO NOT USE
```
## FIRST THING TO DO IN A NEW SESSION
1. Read this file
2. Check current exp27 progress: `grep "Checkpoint\|Eval\|NEW BEST" <log>`
3. **Immediately arm a background monitor** so the session stays alive:
```bash
LOG=/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log
N=$(grep -c "Checkpoint saved" $LOG)
TARGET=$((N + 5))
until [ $(grep -c "Checkpoint saved" $LOG) -ge $TARGET ]; do sleep 15; done \
&& grep "Eval\|NEW BEST\|Checkpoint" $LOG | tail -20
```
Run that as a background task immediately — don't wait. Then report current status to the user.
## Resume Monitoring (new session)
Background task notifications don't survive session switches. In a new session:
```bash
# Check results so far
grep "Checkpoint\|Eval\|NEW BEST" \
/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_133703_random_roads.log
# Arm fresh background monitor (replace N with current checkpoint count + 5)
LOG=/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_133703_random_roads.log
until [ $(grep -c "Checkpoint saved" $LOG) -ge N ]; do sleep 15; done && grep "Eval\|NEW BEST\|Checkpoint" $LOG
```
## Exp27 Results So Far
Previous run (run_2026-05-06_133703, PID 1082126) — killed at 40k to deploy self-intersection fix:
| Step | Seed | Reward | Steps | Status |
|------|------|--------|-------|--------|
| Initial | 81035 | — | — | first road |
| 10k | 68546 | 39.0r | 145 | ❌ (may have self-intersected) |
| 20k | 35735 | 71.6r | 230 | ❌ |
| 30k | 98061 | 39.2r | 139 | ❌ |
| 40k | 2167 | 33.9r | 148 | ❌ (run killed — deploying fix) |
Current run (run_2026-05-06_141328, PID 1094759) — self-intersection fix active:
| Step | Seed | Reward | Steps | Status |
|------|------|--------|-------|--------|
| Initial | 89942 | — | — | first road (no crossing) |
| 10k | 63790 | 250.5r | 924 | ❌@924 — 6× better than pre-fix |
| 20k | 54863 | 275.1r | 925 | ❌@925 — NEW BEST |
| 30k | 84765 | 377.3r | 1325 | ❌@1325 — NEW BEST |
| 40k | 62695 | 33.8r | 134 | ❌@134 — outlier (very hard road) |
| 50k | 51171 | 452.6r | 1575 | ❌@1575 — NEW BEST |
| 60k | 13427 | 289.0r | 1013 | ❌@1013 |
| 70k | 99752 | 432.3r | 1648 | ❌@1648 — NEW BEST steps |
| 80k | 40584 | 449.9r | 1567 | ❌@1567 |
| 90k | 23677 | 444.3r | 1522 | ❌@1522 |
| 100k | 11818 | 30.4r | 160 | ❌@160 — outlier (hard road) |
| 110k | 15439 | 462.7r | 1580 | ❌@1580 |
| 120k | 79776 | 251.7r | 893 | ❌@893 |
| 130k | 51 | 273.5r | 1029 | ❌@1029 |
| 140k | 15985 | 386.8r | 1260 | ❌@1260 |
| 150k | 78623 | 50.5r | 193 | ❌@193 — outlier |
| 160k | 68780 | 194.3r | 753 | ❌@753 — low |
| 170k | 27669 | 375.2r | 1371 | ❌@1371 |
| 180k | 32153 | 45.6r | 188 | ❌@188 — outlier |
| 190k | 23522 | 444.2r | 1652 | ❌@1652 — NEW BEST (+4 steps) |
| 200k | 35712 | 200.8r | 657 | ❌@657 |
| 210k | 84828 | 53.5r | 219 | ❌@219 — outlier |
| 220k | 66225 | 425.7r | 1612 | ❌@1612 |
| 230k | 41094 | 162.1r | 581 | ❌@581 — low |
| 240k | 51566 | 438.2r | 1613 | ❌@1613 |
| 250k | 18319 | 19.8r | 116 | ❌@116 — hard outlier |
| 260k | 99555 | 182.6r | 603 | ❌@603 — low |
| 270k | 59896 | 59.4r | 228 | ❌@228 — hard outlier |
| 280k | — | — | — | eval pending — TREND CONCERNING: regression since 190k |
Early training — fresh weights, results will improve significantly by 50-100k.
## Critical Fixes Applied This Session
### 1. regen_road was silently doing nothing (ROOT CAUSE FOUND)
`TcpCarHandler.cs` `RegenRoad()` coroutine only ran if `TrainingManager != null`.
The `generated_road` scene has NO TrainingManager (it's for PID-based imitation learning
data collection, not RL). So `regen_road` always hit the null check and did nothing.
**All of exp24/25/26/27-first-run trained on ONE road — the scene's initial road.**
**Fix:** Added else branch to call `RoadBuilder.DestroyRoad()` + `PathManager.InitCarPath()`
directly when TrainingManager is absent. Road regen now verified working by user observation.
### 2. MapOverlay minimap not refreshing after regen
`RefreshPath()` only triggered on node count change (always 100). Fixed to also check
node[10] position — different seeds produce different positions.
### 3. PPO gradient update pause (BrakeOnUpdateCallback)
During `model.learn()`, Python stops calling `env.step()` for 3-8s (gradient updates).
Last control command persists → car drifts. `BrakeOnUpdateCallback._on_rollout_end()`
sends zero-control message before updates begin.
## Exp27 Configuration
- Script: `agent/experiments/exp27_random_roads.py`
- Fresh weights (no warm start)
- N_STEER=7, N_THROTTLE=3 → 21 discrete actions → throttle bins [0.2, 0.5, 1.0]
- LR=0.0003, ent_coef=0.05, n_steps=1024
- 500k total steps, checkpoint every 10k
- regen_road with random seed each checkpoint (VERIFIED WORKING)
- CTE termination: >2.0m for >0.5s
- BrakeOnUpdateCallback: enabled
- set_ai_text: pushes stats to sim overlay each checkpoint
## New Unity Build (deployed this session)
Assembly-CSharp.dll rebuilt and rsynced. Changes:
- **regen_road fix**: Direct RoadBuilder+PathManager fallback when no TrainingManager
- **MapOverlay.cs**: Minimap refreshes on node position change (not just count)
- **Fixed steering display**: `Steer: -1.000 Thr:0.20` format
- **set_ai_text TCP message**: Python pushes multi-line text to sim overlay
- **PathManager.cs self-intersection fix**: retry loop with proper XZ segment math (2026-05-06 14:08)
**DLL source for rsync** — MUST use the Builds output, not Library/ScriptAssemblies:
- Source: `.../sdsim/Builds/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll`
- Destination: `.../DonkeySimWin/DonkeySimWin/donkey_sim_Data/Managed/Assembly-CSharp.dll`
- Library/ScriptAssemblies DLL contains editor references → crashes sim at scene load
## Important Paths
- Project: `/home/paulh/projects/donkeycar-rl-autoresearch`
- Unity source: `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim`
- Unity build output: `.../sdsim/Builds/DonkeySimWin`
- Runtime sim: `/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin`
- Unity build log: `C:\Users\Paul\AppData\Local\Temp\unity_rebuild.log`
## Workflow Pattern Documentation
- Model-agnostic patterns: `~/docs/agent-workflow-patterns.md`
- Global Claude instructions (all sessions): `~/.claude/claude-instructions.md`
- This project's Claude instructions: `agent/../CLAUDE.md`
## Useful Commands
```bash
# Monitor exp27 live
tail -f /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log
# Check all evals
grep "Checkpoint\|Eval\|NEW BEST" /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp27-random-roads/run_2026-05-06_141328_random_roads.log
# Verify sim running
python3 -c "import socket; s=socket.socket(); s.settimeout(3); s.connect(('127.0.0.1',9091)); print('PORT 9091: OK'); s.close()"
# Check connections to sim
ss -tnp | grep 9091
# Unity rebuild command
"/mnt/c/Program Files/Unity/Hub/Editor/6000.4.4f1/Editor/Unity.exe" -quit -batchmode \
-projectPath "C:/Users/Paul/Documents/projects/sdsandbox/sdsim" \
-executeMethod PlayerBuilder.WinBuild \
-logFile "C:/Users/Paul/AppData/Local/Temp/unity_rebuild.log"
# Kill sim
/mnt/c/Windows/System32/taskkill.exe /IM donkey_sim.exe /F
# Launch sim
"/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin/donkey_sim.exe" --port 9091 &
```
## Cross-Model Eval Results (for reference)
Seeds tested: [1001, 2002, 3003, 4004, 5005, 6006, 7007, 8008, 9009, 1234]
These results are from exp24/25/26 which ALL trained on one road — not comparable to exp27.
| Rank | Model | Full eps | Mean steps | Mean reward |
|------|-------|----------|------------|-------------|
| #1 | exp26 | 9/10 | 1958s | 381.2r |
| #2 | exp25 | 9/10 | 1869s | 356.3r |
| #3 | exp24 | 9/10 | 1891s | 347.5r |