docs(handoff): update SESSION_HANDOFF for exp24 readiness

Document WheelCollider root cause, Car.cs raycast fix requirement, road
generation behavior, exp24 launch steps, and updated success criteria.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Paul Huliganga 2026-05-05 17:59:32 -04:00
parent 0d1acf8cdc
commit 78d81827b7
1 changed files with 106 additions and 74 deletions

View File

@ -1,6 +1,6 @@
# RL Donkeycar Session Handoff
Last updated: 2026-05-05 America/Toronto
Last updated: 2026-05-05 America/Toronto (updated during exp23)
## Autonomy Instruction
@ -12,12 +12,11 @@ If the user says only `continue`, interpret it using the instruction above.
## Current Goal
Run a clean, trustworthy exp23 on `generated_road` with:
- Solid BoxCollider barriers (car physically cannot escape)
- Clean reward: speed × CTE_quality + efficiency gate
- No artificial episode caps or Python-side exploit patches
Get RL training producing genuine improvement again.
Run exp24 on `generated_road` with:
- Discrete steering (7 bins) — smoother, less oscillation than continuous
- Speed-based stuck detection — catches car pressed nose-first against barrier
- Road regeneration — sim reconnects between segments so each eval is a fresh road
- Unity raycast fix — Car.cs detects nose-first barrier contact via forward raycast
## Important Paths
@ -39,82 +38,92 @@ Unity build log:
## What Was Fixed This Session
### Root cause identified and fixed
### Barrier physics (previous session, still in effect)
**The car was escaping the track because:**
1. Barriers were zero-thickness `MeshCollider` planes — no physical volume
2. Car Rigidbody had no CCD — default `Discrete` mode allows tunneling
- `RoadBuilder.cs`: BoxCollider per segment with overlap to close corner gaps
- `Car.cs`: CCD mode prevents tunneling through thin geometry
- `showBarrierMeshes=false` in scene YAML files (barriers invisible to RL camera)
Both problems created a simulator where the car could literally teleport through
barrier walls between physics frames. Every Python-side "fix" (CTE termination,
time caps, hit detection) was attempting in Python what the physics engine was
failing to enforce.
### Nose-first stuck detection root cause identified and fixed
### Unity changes (source updated, build in progress)
**Why collision detection never fires when perpendicular to barrier:**
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/RoadBuilder.cs`
- Rewrote `CreateBarrier()`: now creates one `BoxCollider` per segment with real
3D volume (`barrierThickness` wide — default 1.0m)
- Segment boxes overlap by `barrierThickness * 0.5` to close corner gaps
- Added `CreateEndCap()`: seals the two open ends of non-looping tracks
(`generated_road` is `closeLoop=0` — without end caps the car can drive off
the ends of the track)
- Added `public float barrierThickness = 1.0f` field (inspector-editable)
- `showBarrierMeshes=true` now shows proper translucent 3D boxes, not flat planes
Unity `WheelColliders` (used for suspension/steering) don't fire `OnCollisionEnter`
or `OnCollisionStay` on the car's `MonoBehaviour`. When the car is nose-first into
a barrier, only the front WHEELS make physical contact. The car BODY collider (which
`Car.cs`'s callbacks are attached to) is slightly behind the wheels and never touches.
At an angle, the car body eventually makes contact — which is why collision detection
works for side-contact but NOT for perpendicular contact.
### Car.cs: forward raycast fix
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/Car.cs`
- Added `rb.collisionDetectionMode = CollisionDetectionMode.Continuous;` in
`Awake()` — prevents tunneling even against any remaining thin geometry
- Added short forward raycast in `FixedUpdate()`:
- When `requestTorque > 0.05f` (throttle applied)
- Casts from car center + 0.3m up, 0.8m forward
- Calls `RegisterCollision()` if anything hit
- This fires whether wheels or body made contact
- **Requires a Unity rebuild + sync before exp24**
### Python changes (committed)
### StuckTerminationWrapper: speed-based check
`agent/reward_wrapper.py` → v7 (clean)
- REMOVED: CTE-patience termination, high-CTE negative reward, solid_hit
monitoring, low-speed/wedge detection, all exploit-closing bandaids
- KEPT: efficiency gate (zero reward when circling), no-progress termination
(active_node), lap exploit guard
- Reward: `speed_norm × CTE_quality` when efficiency passes gate
`agent/multitrack_runner.py` — new params `low_speed_threshold`, `max_low_speed_seconds`:
- If `info['speed'] < 0.5` for 2 wall-clock seconds → terminate
- Catches car pressed against barrier regardless of lateral sliding
- `info['speed']` = `rb.velocity.magnitude / 8.0` in Unity → stuck car = ~0
- Smoke-tested and committed
`agent/experiments/exp23_generated_road_clean.py`
- Single track: `generated_road` on port 9091
- No warm-start (fresh PPO weights)
- `MAX_EPISODE_SECONDS=120` (generous safety net, not a training constraint)
- LR=0.0003, 200k total steps, checkpoints every 10k
### Exp24: discrete steering + road regeneration
`tests/test_reward_wrapper.py` — 17 tests, all pass
`agent/experiments/exp24_generated_road_discrete.py`:
- Action space: `Discrete(7)` via `DiscretizedActionWrapper(n_steer=7, n_throttle=1)`
- Steering bins: -1, -0.67, -0.33, 0, 0.33, 0.67, 1 (throttle fixed at 0.2)
- **Road regeneration**: after each 10k-step segment, `env.close()` + reconnect
- Reconnecting reloads the scene → sdsandbox generates new random road
- Each eval runs on a freshly generated road (proper generalization test)
- ~5s overhead per checkpoint = ~100s total for 200k run
- Speed-based stuck: `LOW_SPEED_THRESHOLD=0.5`, `MAX_LOW_SPEED_SECONDS=2.0`
- `MAX_EPISODE_SECONDS=30` (reduced from 120s — speed check handles barrier stuck)
- LR=0.0003, 200k total steps
### Road generation clarification
The road is NOT regenerated each episode reset. `generated_road` creates a fixed
random layout when the scene LOADS (i.e., when you `gym.make()`). Within a session,
all episodes use the same road. The exp23 eval variance (1071951 steps) was due
to Unity physics non-determinism, NOT road variety.
### Reward wrapper
`agent/reward_wrapper.py` v7 — unchanged, still in effect:
- Reward: `speed_norm × CTE_quality` gated by efficiency
- No Python-side exploit bandaids (physics enforces containment)
## Current State
### Exp 23 is RUNNING
- PID: 647921
- Log: `agent/models/exp23-generated-road-clean/run_2026-05-05_160348_clean.log`
- Started: 2026-05-05 16:03
- Barriers visually confirmed by Paul before launch
### Exp 23 status
- Running (PID 649531), at ~140k/200k steps as of last check
- Will finish on its own — DO NOT kill it
- Log: `agent/models/exp23-generated-road-clean/run_2026-05-05_160718_clean.log`
### Build and sync status
- Unity build: completed successfully 2026-05-05 15:57
- Both runtime folders synced
- Sim on port 9091 running (generated_road)
- Port 9093 / second sim NOT needed for exp23
### Unity build status
- **Needs rebuild** — Car.cs raycast fix not yet compiled
- Car.cs was modified at:
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/Car.cs`
## Key Parameters (exp23)
### To launch exp24
| Setting | Value | Why |
|---|---|---|
| Track | generated_road | Single track — diagnose before adding second |
| LR | 0.0003 | Standard PPO starting LR |
| Total steps | 200k | More room to learn with clean signal |
| max_episode_seconds | 120s | Safety net only — physics does the work |
| MAX_CTE_TERMINATE | none | Removed — barriers are physical now |
| Warm-start | none | Previous warm-starts trained on broken reward |
| showBarrierMeshes | ON | Verify visually before committing to long run |
## Success Criteria
- Car cannot drive past the barrier walls (verify visually)
- ep_len_mean should INCREASE over checkpoints (not frozen at 118)
- eval steps should improve at 20k, 30k, 40k checkpoints
- No evidence of outside-road circling in the reward curve
1. Wait for exp23 to finish (or confirm it has)
2. Rebuild Unity (Car.cs raycast fix)
3. Stop sim on port 9091
4. Rsync build to runtime folders (both, or just the one on 9091)
5. Restart sim on port 9091
6. Launch exp24:
```bash
cd /home/paulh/projects/donkeycar-rl-autoresearch/agent
nohup python3 experiments/exp24_generated_road_discrete.py > /tmp/exp24.out 2>&1 &
tail -f /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp24-discrete/run_*_discrete.log
```
## Useful Commands
@ -124,12 +133,17 @@ tail -20 /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log
grep "Exiting batchmode\|Build failed\|error\|Error" /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log | tail -5
```
### Monitor exp23
### Monitor exp23 (while still running)
```bash
tail -f agent/models/exp23-generated-road-clean/run_*_clean.log
tail -f agent/models/exp23-generated-road-clean/run_2026-05-05_160718_clean.log
```
### Verify ports
### Monitor exp24
```bash
tail -f agent/models/exp24-discrete/run_*_discrete.log
```
### Verify port 9091
```bash
python3 - <<'PY'
import socket
@ -141,9 +155,27 @@ for p in (9091,):
PY
```
### Check exp23 progress
```bash
grep "Eval\|BEST" agent/models/exp23-generated-road-clean/run_2026-05-05_160718_clean.log | tail -20
```
## Success Criteria (exp24)
- Steering is visually smoother (7 discrete bins vs continuous Gaussian)
- Car stuck against barrier terminates within 2-3 seconds (speed check)
- Eval scores are more meaningful — each checkpoint tests a DIFFERENT road layout
- ep_len_mean should continue increasing from the baseline exp23 established
## Notes for Next Session
- If the user says `continue`, do not ask broad questions. Check build log → sync → launch → verify barriers → start exp23.
- **Barrier visual confirmation is required before starting exp23.** Paul must see the translucent 3D boxes on both sides of the road with no gaps before committing to a 200k training run.
- The second sim (port 9093) is not needed for exp23 — only launch one sim.
- Do not add generated_track back until generated_road training is verified working.
- Unity rebuild is required before exp24 — Car.cs raycast fix won't be in effect
until the build is done and synced.
- The second sim (port 9093) is not needed — only port 9091.
- Do NOT kill exp23 — let it run to completion.
- Exp24's road regeneration adds ~5s per checkpoint = ~100s extra total. This is
by design. The "Reconnecting for fresh road" log lines confirm it's working.
- `info['speed']` from telemetry = `rb.velocity.magnitude / 8.0`. The
`LOW_SPEED_THRESHOLD=0.5` corresponds to 4 Unity m/s, which is slow but not zero.
A truly stuck car reads ~0.0. Tight corners might temporarily be 0.10.3.
The 2-second timer provides enough grace for normal slow driving.