docs(handoff): update SESSION_HANDOFF for exp24 readiness

Document WheelCollider root cause, Car.cs raycast fix requirement, road
generation behavior, exp24 launch steps, and updated success criteria.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Paul Huliganga 2026-05-05 17:59:32 -04:00
parent 0d1acf8cdc
commit 78d81827b7
1 changed files with 106 additions and 74 deletions

View File

@ -1,6 +1,6 @@
# RL Donkeycar Session Handoff # RL Donkeycar Session Handoff
Last updated: 2026-05-05 America/Toronto Last updated: 2026-05-05 America/Toronto (updated during exp23)
## Autonomy Instruction ## Autonomy Instruction
@ -12,12 +12,11 @@ If the user says only `continue`, interpret it using the instruction above.
## Current Goal ## Current Goal
Run a clean, trustworthy exp23 on `generated_road` with: Run exp24 on `generated_road` with:
- Solid BoxCollider barriers (car physically cannot escape) - Discrete steering (7 bins) — smoother, less oscillation than continuous
- Clean reward: speed × CTE_quality + efficiency gate - Speed-based stuck detection — catches car pressed nose-first against barrier
- No artificial episode caps or Python-side exploit patches - Road regeneration — sim reconnects between segments so each eval is a fresh road
- Unity raycast fix — Car.cs detects nose-first barrier contact via forward raycast
Get RL training producing genuine improvement again.
## Important Paths ## Important Paths
@ -39,82 +38,92 @@ Unity build log:
## What Was Fixed This Session ## What Was Fixed This Session
### Root cause identified and fixed ### Barrier physics (previous session, still in effect)
**The car was escaping the track because:** - `RoadBuilder.cs`: BoxCollider per segment with overlap to close corner gaps
1. Barriers were zero-thickness `MeshCollider` planes — no physical volume - `Car.cs`: CCD mode prevents tunneling through thin geometry
2. Car Rigidbody had no CCD — default `Discrete` mode allows tunneling - `showBarrierMeshes=false` in scene YAML files (barriers invisible to RL camera)
Both problems created a simulator where the car could literally teleport through ### Nose-first stuck detection root cause identified and fixed
barrier walls between physics frames. Every Python-side "fix" (CTE termination,
time caps, hit detection) was attempting in Python what the physics engine was
failing to enforce.
### Unity changes (source updated, build in progress) **Why collision detection never fires when perpendicular to barrier:**
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/RoadBuilder.cs` Unity `WheelColliders` (used for suspension/steering) don't fire `OnCollisionEnter`
- Rewrote `CreateBarrier()`: now creates one `BoxCollider` per segment with real or `OnCollisionStay` on the car's `MonoBehaviour`. When the car is nose-first into
3D volume (`barrierThickness` wide — default 1.0m) a barrier, only the front WHEELS make physical contact. The car BODY collider (which
- Segment boxes overlap by `barrierThickness * 0.5` to close corner gaps `Car.cs`'s callbacks are attached to) is slightly behind the wheels and never touches.
- Added `CreateEndCap()`: seals the two open ends of non-looping tracks At an angle, the car body eventually makes contact — which is why collision detection
(`generated_road` is `closeLoop=0` — without end caps the car can drive off works for side-contact but NOT for perpendicular contact.
the ends of the track)
- Added `public float barrierThickness = 1.0f` field (inspector-editable) ### Car.cs: forward raycast fix
- `showBarrierMeshes=true` now shows proper translucent 3D boxes, not flat planes
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/Car.cs` `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/Car.cs`
- Added `rb.collisionDetectionMode = CollisionDetectionMode.Continuous;` in - Added short forward raycast in `FixedUpdate()`:
`Awake()` — prevents tunneling even against any remaining thin geometry - When `requestTorque > 0.05f` (throttle applied)
- Casts from car center + 0.3m up, 0.8m forward
- Calls `RegisterCollision()` if anything hit
- This fires whether wheels or body made contact
- **Requires a Unity rebuild + sync before exp24**
### Python changes (committed) ### StuckTerminationWrapper: speed-based check
`agent/reward_wrapper.py` → v7 (clean) `agent/multitrack_runner.py` — new params `low_speed_threshold`, `max_low_speed_seconds`:
- REMOVED: CTE-patience termination, high-CTE negative reward, solid_hit - If `info['speed'] < 0.5` for 2 wall-clock seconds → terminate
monitoring, low-speed/wedge detection, all exploit-closing bandaids - Catches car pressed against barrier regardless of lateral sliding
- KEPT: efficiency gate (zero reward when circling), no-progress termination - `info['speed']` = `rb.velocity.magnitude / 8.0` in Unity → stuck car = ~0
(active_node), lap exploit guard - Smoke-tested and committed
- Reward: `speed_norm × CTE_quality` when efficiency passes gate
`agent/experiments/exp23_generated_road_clean.py` ### Exp24: discrete steering + road regeneration
- Single track: `generated_road` on port 9091
- No warm-start (fresh PPO weights)
- `MAX_EPISODE_SECONDS=120` (generous safety net, not a training constraint)
- LR=0.0003, 200k total steps, checkpoints every 10k
`tests/test_reward_wrapper.py` — 17 tests, all pass `agent/experiments/exp24_generated_road_discrete.py`:
- Action space: `Discrete(7)` via `DiscretizedActionWrapper(n_steer=7, n_throttle=1)`
- Steering bins: -1, -0.67, -0.33, 0, 0.33, 0.67, 1 (throttle fixed at 0.2)
- **Road regeneration**: after each 10k-step segment, `env.close()` + reconnect
- Reconnecting reloads the scene → sdsandbox generates new random road
- Each eval runs on a freshly generated road (proper generalization test)
- ~5s overhead per checkpoint = ~100s total for 200k run
- Speed-based stuck: `LOW_SPEED_THRESHOLD=0.5`, `MAX_LOW_SPEED_SECONDS=2.0`
- `MAX_EPISODE_SECONDS=30` (reduced from 120s — speed check handles barrier stuck)
- LR=0.0003, 200k total steps
### Road generation clarification
The road is NOT regenerated each episode reset. `generated_road` creates a fixed
random layout when the scene LOADS (i.e., when you `gym.make()`). Within a session,
all episodes use the same road. The exp23 eval variance (1071951 steps) was due
to Unity physics non-determinism, NOT road variety.
### Reward wrapper
`agent/reward_wrapper.py` v7 — unchanged, still in effect:
- Reward: `speed_norm × CTE_quality` gated by efficiency
- No Python-side exploit bandaids (physics enforces containment)
## Current State ## Current State
### Exp 23 is RUNNING ### Exp 23 status
- PID: 647921 - Running (PID 649531), at ~140k/200k steps as of last check
- Log: `agent/models/exp23-generated-road-clean/run_2026-05-05_160348_clean.log` - Will finish on its own — DO NOT kill it
- Started: 2026-05-05 16:03 - Log: `agent/models/exp23-generated-road-clean/run_2026-05-05_160718_clean.log`
- Barriers visually confirmed by Paul before launch
### Build and sync status ### Unity build status
- Unity build: completed successfully 2026-05-05 15:57 - **Needs rebuild** — Car.cs raycast fix not yet compiled
- Both runtime folders synced - Car.cs was modified at:
- Sim on port 9091 running (generated_road) `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/Car.cs`
- Port 9093 / second sim NOT needed for exp23
## Key Parameters (exp23) ### To launch exp24
| Setting | Value | Why | 1. Wait for exp23 to finish (or confirm it has)
|---|---|---| 2. Rebuild Unity (Car.cs raycast fix)
| Track | generated_road | Single track — diagnose before adding second | 3. Stop sim on port 9091
| LR | 0.0003 | Standard PPO starting LR | 4. Rsync build to runtime folders (both, or just the one on 9091)
| Total steps | 200k | More room to learn with clean signal | 5. Restart sim on port 9091
| max_episode_seconds | 120s | Safety net only — physics does the work | 6. Launch exp24:
| MAX_CTE_TERMINATE | none | Removed — barriers are physical now | ```bash
| Warm-start | none | Previous warm-starts trained on broken reward | cd /home/paulh/projects/donkeycar-rl-autoresearch/agent
| showBarrierMeshes | ON | Verify visually before committing to long run | nohup python3 experiments/exp24_generated_road_discrete.py > /tmp/exp24.out 2>&1 &
tail -f /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp24-discrete/run_*_discrete.log
## Success Criteria ```
- Car cannot drive past the barrier walls (verify visually)
- ep_len_mean should INCREASE over checkpoints (not frozen at 118)
- eval steps should improve at 20k, 30k, 40k checkpoints
- No evidence of outside-road circling in the reward curve
## Useful Commands ## Useful Commands
@ -124,12 +133,17 @@ tail -20 /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log
grep "Exiting batchmode\|Build failed\|error\|Error" /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log | tail -5 grep "Exiting batchmode\|Build failed\|error\|Error" /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log | tail -5
``` ```
### Monitor exp23 ### Monitor exp23 (while still running)
```bash ```bash
tail -f agent/models/exp23-generated-road-clean/run_*_clean.log tail -f agent/models/exp23-generated-road-clean/run_2026-05-05_160718_clean.log
``` ```
### Verify ports ### Monitor exp24
```bash
tail -f agent/models/exp24-discrete/run_*_discrete.log
```
### Verify port 9091
```bash ```bash
python3 - <<'PY' python3 - <<'PY'
import socket import socket
@ -141,9 +155,27 @@ for p in (9091,):
PY PY
``` ```
### Check exp23 progress
```bash
grep "Eval\|BEST" agent/models/exp23-generated-road-clean/run_2026-05-05_160718_clean.log | tail -20
```
## Success Criteria (exp24)
- Steering is visually smoother (7 discrete bins vs continuous Gaussian)
- Car stuck against barrier terminates within 2-3 seconds (speed check)
- Eval scores are more meaningful — each checkpoint tests a DIFFERENT road layout
- ep_len_mean should continue increasing from the baseline exp23 established
## Notes for Next Session ## Notes for Next Session
- If the user says `continue`, do not ask broad questions. Check build log → sync → launch → verify barriers → start exp23. - Unity rebuild is required before exp24 — Car.cs raycast fix won't be in effect
- **Barrier visual confirmation is required before starting exp23.** Paul must see the translucent 3D boxes on both sides of the road with no gaps before committing to a 200k training run. until the build is done and synced.
- The second sim (port 9093) is not needed for exp23 — only launch one sim. - The second sim (port 9093) is not needed — only port 9091.
- Do not add generated_track back until generated_road training is verified working. - Do NOT kill exp23 — let it run to completion.
- Exp24's road regeneration adds ~5s per checkpoint = ~100s extra total. This is
by design. The "Reconnecting for fresh road" log lines confirm it's working.
- `info['speed']` from telemetry = `rb.velocity.magnitude / 8.0`. The
`LOW_SPEED_THRESHOLD=0.5` corresponds to 4 Unity m/s, which is slow but not zero.
A truly stuck car reads ~0.0. Tight corners might temporarily be 0.10.3.
The 2-second timer provides enough grace for normal slow driving.