183 lines
7.4 KiB
Markdown
183 lines
7.4 KiB
Markdown
# RL Donkeycar Session Handoff
|
||
|
||
Last updated: 2026-05-05 America/Toronto
|
||
|
||
## Autonomy Instruction
|
||
|
||
Use this as the standing instruction for follow-on sessions:
|
||
|
||
`Continue the Donkeycar RL/sim work autonomously. Rebuild, sync, relaunch, run diagnostics, patch code, and restart experiments as needed. Keep going until you either have a verified fix and a running experiment, or a concrete blocker that truly requires the user. Do not stop just to ask for permission on ordinary reversible steps. Only pause for real risk of data loss, destructive actions, missing credentials/access, or major strategy tradeoffs that require a user decision.`
|
||
|
||
If the user says only `continue`, interpret it using the instruction above.
|
||
|
||
## Current Goal
|
||
|
||
Run a clean, trustworthy exp23 on `generated_road` with:
|
||
- Solid BoxCollider barriers (car physically cannot escape)
|
||
- Clean reward: speed × CTE_quality + efficiency gate
|
||
- No artificial episode caps or Python-side exploit patches
|
||
|
||
Get RL training producing genuine improvement again.
|
||
|
||
## Important Paths
|
||
|
||
Project:
|
||
- `/home/paulh/projects/donkeycar-rl-autoresearch`
|
||
|
||
Unity source project:
|
||
- `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim`
|
||
|
||
Unity build output:
|
||
- `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Builds/DonkeySimWin`
|
||
|
||
Current runtime simulator folders in use:
|
||
- `/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin`
|
||
- `/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin - Copy`
|
||
|
||
Unity build log:
|
||
- `C:\Users\Paul\AppData\Local\Temp\unity_rebuild.log`
|
||
|
||
## What Was Fixed This Session
|
||
|
||
### Root cause identified and fixed
|
||
|
||
**The car was escaping the track because:**
|
||
1. Barriers were zero-thickness `MeshCollider` planes — no physical volume
|
||
2. Car Rigidbody had no CCD — default `Discrete` mode allows tunneling
|
||
|
||
Both problems created a simulator where the car could literally teleport through
|
||
barrier walls between physics frames. Every Python-side "fix" (CTE termination,
|
||
time caps, hit detection) was attempting in Python what the physics engine was
|
||
failing to enforce.
|
||
|
||
### Unity changes (source updated, build in progress)
|
||
|
||
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/RoadBuilder.cs`
|
||
- Rewrote `CreateBarrier()`: now creates one `BoxCollider` per segment with real
|
||
3D volume (`barrierThickness` wide — default 1.0m)
|
||
- Segment boxes overlap by `barrierThickness * 0.5` to close corner gaps
|
||
- Added `CreateEndCap()`: seals the two open ends of non-looping tracks
|
||
(`generated_road` is `closeLoop=0` — without end caps the car can drive off
|
||
the ends of the track)
|
||
- Added `public float barrierThickness = 1.0f` field (inspector-editable)
|
||
- `showBarrierMeshes=true` now shows proper translucent 3D boxes, not flat planes
|
||
|
||
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/Car.cs`
|
||
- Added `rb.collisionDetectionMode = CollisionDetectionMode.Continuous;` in
|
||
`Awake()` — prevents tunneling even against any remaining thin geometry
|
||
|
||
### Python changes (committed)
|
||
|
||
`agent/reward_wrapper.py` → v7 (clean)
|
||
- REMOVED: CTE-patience termination, high-CTE negative reward, solid_hit
|
||
monitoring, low-speed/wedge detection, all exploit-closing bandaids
|
||
- KEPT: efficiency gate (zero reward when circling), no-progress termination
|
||
(active_node), lap exploit guard
|
||
- Reward: `speed_norm × CTE_quality` when efficiency passes gate
|
||
|
||
`agent/experiments/exp23_generated_road_clean.py`
|
||
- Single track: `generated_road` on port 9091
|
||
- No warm-start (fresh PPO weights)
|
||
- `MAX_EPISODE_SECONDS=120` (generous safety net, not a training constraint)
|
||
- LR=0.0003, 200k total steps, checkpoints every 10k
|
||
|
||
`tests/test_reward_wrapper.py` — 17 tests, all pass
|
||
|
||
## Current State
|
||
|
||
### Unity build
|
||
- Build launched with PID 37896 on 2026-05-05
|
||
- Log: `C:\Users\Paul\AppData\Local\Temp\unity_rebuild.log`
|
||
- Check: `grep -q "Exiting batchmode successfully" /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log && echo OK`
|
||
|
||
### After build completes
|
||
1. Sync to both runtime folders:
|
||
```bash
|
||
rsync -a --delete '/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Builds/DonkeySimWin/' \
|
||
'/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin/'
|
||
rsync -a --delete '/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Builds/DonkeySimWin/' \
|
||
'/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin - Copy/'
|
||
```
|
||
|
||
2. Launch sims (only port 9091 needed for exp23 — single env):
|
||
```powershell
|
||
$key = 'HKCU:\Software\DonkeyCar\donkey_sim'
|
||
Set-ItemProperty -Path $key -Name 'port_h2088097884' -Value 9091 -Type DWord
|
||
Set-ItemProperty -Path $key -Name 'portPrivateAPI_h1325370089' -Value 9092 -Type DWord
|
||
Start-Process -FilePath 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin\donkey_sim.exe' `
|
||
-ArgumentList '--port','9091' -WorkingDirectory 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin'
|
||
```
|
||
|
||
3. Verify port:
|
||
```bash
|
||
python3 -c "import socket; s=socket.socket(); s.settimeout(3); s.connect(('127.0.0.1',9091)); print('PORT 9091: OK'); s.close()"
|
||
```
|
||
|
||
4. Visually verify barriers in the sim window:
|
||
- `showBarrierMeshes=1` is already set in both scene files
|
||
- Translucent box barriers should be visible on BOTH sides of the road
|
||
- Verify no gaps at corners
|
||
- Verify end-cap walls at start and finish of generated_road
|
||
- **Do not start exp23 until Paul confirms barriers look correct**
|
||
|
||
5. Launch exp23:
|
||
```bash
|
||
cd /home/paulh/projects/donkeycar-rl-autoresearch
|
||
SAVE_DIR=agent/models/exp23-generated-road-clean
|
||
mkdir -p $SAVE_DIR
|
||
nohup python3 agent/experiments/exp23_generated_road_clean.py \
|
||
> $SAVE_DIR/run_$(date +%Y-%m-%d_%H%M%S)_clean.log 2>&1 &
|
||
echo $! > $SAVE_DIR/current.pid
|
||
```
|
||
|
||
## Key Parameters (exp23)
|
||
|
||
| Setting | Value | Why |
|
||
|---|---|---|
|
||
| Track | generated_road | Single track — diagnose before adding second |
|
||
| LR | 0.0003 | Standard PPO starting LR |
|
||
| Total steps | 200k | More room to learn with clean signal |
|
||
| max_episode_seconds | 120s | Safety net only — physics does the work |
|
||
| MAX_CTE_TERMINATE | none | Removed — barriers are physical now |
|
||
| Warm-start | none | Previous warm-starts trained on broken reward |
|
||
| showBarrierMeshes | ON | Verify visually before committing to long run |
|
||
|
||
## Success Criteria
|
||
|
||
- Car cannot drive past the barrier walls (verify visually)
|
||
- ep_len_mean should INCREASE over checkpoints (not frozen at 118)
|
||
- eval steps should improve at 20k, 30k, 40k checkpoints
|
||
- No evidence of outside-road circling in the reward curve
|
||
|
||
## Useful Commands
|
||
|
||
### Check build log
|
||
```bash
|
||
tail -20 /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log
|
||
grep "Exiting batchmode\|Build failed\|error\|Error" /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log | tail -5
|
||
```
|
||
|
||
### Monitor exp23
|
||
```bash
|
||
tail -f agent/models/exp23-generated-road-clean/run_*_clean.log
|
||
```
|
||
|
||
### Verify ports
|
||
```bash
|
||
python3 - <<'PY'
|
||
import socket
|
||
for p in (9091,):
|
||
s = socket.socket(); s.settimeout(3)
|
||
try: s.connect(('127.0.0.1', p)); print(f'PORT {p}: OK')
|
||
except Exception as e: print(f'PORT {p}: FAIL {e}')
|
||
finally: s.close()
|
||
PY
|
||
```
|
||
|
||
## Notes for Next Session
|
||
|
||
- If the user says `continue`, do not ask broad questions. Check build log → sync → launch → verify barriers → start exp23.
|
||
- **Barrier visual confirmation is required before starting exp23.** Paul must see the translucent 3D boxes on both sides of the road with no gaps before committing to a 200k training run.
|
||
- The second sim (port 9093) is not needed for exp23 — only launch one sim.
|
||
- Do not add generated_track back until generated_road training is verified working.
|