docs(handoff): update SESSION_HANDOFF for exp24 readiness
Document WheelCollider root cause, Car.cs raycast fix requirement, road generation behavior, exp24 launch steps, and updated success criteria. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
0d1acf8cdc
commit
78d81827b7
|
|
@ -1,6 +1,6 @@
|
||||||
# RL Donkeycar Session Handoff
|
# RL Donkeycar Session Handoff
|
||||||
|
|
||||||
Last updated: 2026-05-05 America/Toronto
|
Last updated: 2026-05-05 America/Toronto (updated during exp23)
|
||||||
|
|
||||||
## Autonomy Instruction
|
## Autonomy Instruction
|
||||||
|
|
||||||
|
|
@ -12,12 +12,11 @@ If the user says only `continue`, interpret it using the instruction above.
|
||||||
|
|
||||||
## Current Goal
|
## Current Goal
|
||||||
|
|
||||||
Run a clean, trustworthy exp23 on `generated_road` with:
|
Run exp24 on `generated_road` with:
|
||||||
- Solid BoxCollider barriers (car physically cannot escape)
|
- Discrete steering (7 bins) — smoother, less oscillation than continuous
|
||||||
- Clean reward: speed × CTE_quality + efficiency gate
|
- Speed-based stuck detection — catches car pressed nose-first against barrier
|
||||||
- No artificial episode caps or Python-side exploit patches
|
- Road regeneration — sim reconnects between segments so each eval is a fresh road
|
||||||
|
- Unity raycast fix — Car.cs detects nose-first barrier contact via forward raycast
|
||||||
Get RL training producing genuine improvement again.
|
|
||||||
|
|
||||||
## Important Paths
|
## Important Paths
|
||||||
|
|
||||||
|
|
@ -39,82 +38,92 @@ Unity build log:
|
||||||
|
|
||||||
## What Was Fixed This Session
|
## What Was Fixed This Session
|
||||||
|
|
||||||
### Root cause identified and fixed
|
### Barrier physics (previous session, still in effect)
|
||||||
|
|
||||||
**The car was escaping the track because:**
|
- `RoadBuilder.cs`: BoxCollider per segment with overlap to close corner gaps
|
||||||
1. Barriers were zero-thickness `MeshCollider` planes — no physical volume
|
- `Car.cs`: CCD mode prevents tunneling through thin geometry
|
||||||
2. Car Rigidbody had no CCD — default `Discrete` mode allows tunneling
|
- `showBarrierMeshes=false` in scene YAML files (barriers invisible to RL camera)
|
||||||
|
|
||||||
Both problems created a simulator where the car could literally teleport through
|
### Nose-first stuck detection root cause identified and fixed
|
||||||
barrier walls between physics frames. Every Python-side "fix" (CTE termination,
|
|
||||||
time caps, hit detection) was attempting in Python what the physics engine was
|
|
||||||
failing to enforce.
|
|
||||||
|
|
||||||
### Unity changes (source updated, build in progress)
|
**Why collision detection never fires when perpendicular to barrier:**
|
||||||
|
|
||||||
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/RoadBuilder.cs`
|
Unity `WheelColliders` (used for suspension/steering) don't fire `OnCollisionEnter`
|
||||||
- Rewrote `CreateBarrier()`: now creates one `BoxCollider` per segment with real
|
or `OnCollisionStay` on the car's `MonoBehaviour`. When the car is nose-first into
|
||||||
3D volume (`barrierThickness` wide — default 1.0m)
|
a barrier, only the front WHEELS make physical contact. The car BODY collider (which
|
||||||
- Segment boxes overlap by `barrierThickness * 0.5` to close corner gaps
|
`Car.cs`'s callbacks are attached to) is slightly behind the wheels and never touches.
|
||||||
- Added `CreateEndCap()`: seals the two open ends of non-looping tracks
|
At an angle, the car body eventually makes contact — which is why collision detection
|
||||||
(`generated_road` is `closeLoop=0` — without end caps the car can drive off
|
works for side-contact but NOT for perpendicular contact.
|
||||||
the ends of the track)
|
|
||||||
- Added `public float barrierThickness = 1.0f` field (inspector-editable)
|
### Car.cs: forward raycast fix
|
||||||
- `showBarrierMeshes=true` now shows proper translucent 3D boxes, not flat planes
|
|
||||||
|
|
||||||
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/Car.cs`
|
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/Car.cs`
|
||||||
- Added `rb.collisionDetectionMode = CollisionDetectionMode.Continuous;` in
|
- Added short forward raycast in `FixedUpdate()`:
|
||||||
`Awake()` — prevents tunneling even against any remaining thin geometry
|
- When `requestTorque > 0.05f` (throttle applied)
|
||||||
|
- Casts from car center + 0.3m up, 0.8m forward
|
||||||
|
- Calls `RegisterCollision()` if anything hit
|
||||||
|
- This fires whether wheels or body made contact
|
||||||
|
- **Requires a Unity rebuild + sync before exp24**
|
||||||
|
|
||||||
### Python changes (committed)
|
### StuckTerminationWrapper: speed-based check
|
||||||
|
|
||||||
`agent/reward_wrapper.py` → v7 (clean)
|
`agent/multitrack_runner.py` — new params `low_speed_threshold`, `max_low_speed_seconds`:
|
||||||
- REMOVED: CTE-patience termination, high-CTE negative reward, solid_hit
|
- If `info['speed'] < 0.5` for 2 wall-clock seconds → terminate
|
||||||
monitoring, low-speed/wedge detection, all exploit-closing bandaids
|
- Catches car pressed against barrier regardless of lateral sliding
|
||||||
- KEPT: efficiency gate (zero reward when circling), no-progress termination
|
- `info['speed']` = `rb.velocity.magnitude / 8.0` in Unity → stuck car = ~0
|
||||||
(active_node), lap exploit guard
|
- Smoke-tested and committed
|
||||||
- Reward: `speed_norm × CTE_quality` when efficiency passes gate
|
|
||||||
|
|
||||||
`agent/experiments/exp23_generated_road_clean.py`
|
### Exp24: discrete steering + road regeneration
|
||||||
- Single track: `generated_road` on port 9091
|
|
||||||
- No warm-start (fresh PPO weights)
|
|
||||||
- `MAX_EPISODE_SECONDS=120` (generous safety net, not a training constraint)
|
|
||||||
- LR=0.0003, 200k total steps, checkpoints every 10k
|
|
||||||
|
|
||||||
`tests/test_reward_wrapper.py` — 17 tests, all pass
|
`agent/experiments/exp24_generated_road_discrete.py`:
|
||||||
|
- Action space: `Discrete(7)` via `DiscretizedActionWrapper(n_steer=7, n_throttle=1)`
|
||||||
|
- Steering bins: -1, -0.67, -0.33, 0, 0.33, 0.67, 1 (throttle fixed at 0.2)
|
||||||
|
- **Road regeneration**: after each 10k-step segment, `env.close()` + reconnect
|
||||||
|
- Reconnecting reloads the scene → sdsandbox generates new random road
|
||||||
|
- Each eval runs on a freshly generated road (proper generalization test)
|
||||||
|
- ~5s overhead per checkpoint = ~100s total for 200k run
|
||||||
|
- Speed-based stuck: `LOW_SPEED_THRESHOLD=0.5`, `MAX_LOW_SPEED_SECONDS=2.0`
|
||||||
|
- `MAX_EPISODE_SECONDS=30` (reduced from 120s — speed check handles barrier stuck)
|
||||||
|
- LR=0.0003, 200k total steps
|
||||||
|
|
||||||
|
### Road generation clarification
|
||||||
|
|
||||||
|
The road is NOT regenerated each episode reset. `generated_road` creates a fixed
|
||||||
|
random layout when the scene LOADS (i.e., when you `gym.make()`). Within a session,
|
||||||
|
all episodes use the same road. The exp23 eval variance (107–1951 steps) was due
|
||||||
|
to Unity physics non-determinism, NOT road variety.
|
||||||
|
|
||||||
|
### Reward wrapper
|
||||||
|
|
||||||
|
`agent/reward_wrapper.py` v7 — unchanged, still in effect:
|
||||||
|
- Reward: `speed_norm × CTE_quality` gated by efficiency
|
||||||
|
- No Python-side exploit bandaids (physics enforces containment)
|
||||||
|
|
||||||
## Current State
|
## Current State
|
||||||
|
|
||||||
### Exp 23 is RUNNING
|
### Exp 23 status
|
||||||
- PID: 647921
|
- Running (PID 649531), at ~140k/200k steps as of last check
|
||||||
- Log: `agent/models/exp23-generated-road-clean/run_2026-05-05_160348_clean.log`
|
- Will finish on its own — DO NOT kill it
|
||||||
- Started: 2026-05-05 16:03
|
- Log: `agent/models/exp23-generated-road-clean/run_2026-05-05_160718_clean.log`
|
||||||
- Barriers visually confirmed by Paul before launch
|
|
||||||
|
|
||||||
### Build and sync status
|
### Unity build status
|
||||||
- Unity build: completed successfully 2026-05-05 15:57
|
- **Needs rebuild** — Car.cs raycast fix not yet compiled
|
||||||
- Both runtime folders synced
|
- Car.cs was modified at:
|
||||||
- Sim on port 9091 running (generated_road)
|
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/Car.cs`
|
||||||
- Port 9093 / second sim NOT needed for exp23
|
|
||||||
|
|
||||||
## Key Parameters (exp23)
|
### To launch exp24
|
||||||
|
|
||||||
| Setting | Value | Why |
|
1. Wait for exp23 to finish (or confirm it has)
|
||||||
|---|---|---|
|
2. Rebuild Unity (Car.cs raycast fix)
|
||||||
| Track | generated_road | Single track — diagnose before adding second |
|
3. Stop sim on port 9091
|
||||||
| LR | 0.0003 | Standard PPO starting LR |
|
4. Rsync build to runtime folders (both, or just the one on 9091)
|
||||||
| Total steps | 200k | More room to learn with clean signal |
|
5. Restart sim on port 9091
|
||||||
| max_episode_seconds | 120s | Safety net only — physics does the work |
|
6. Launch exp24:
|
||||||
| MAX_CTE_TERMINATE | none | Removed — barriers are physical now |
|
```bash
|
||||||
| Warm-start | none | Previous warm-starts trained on broken reward |
|
cd /home/paulh/projects/donkeycar-rl-autoresearch/agent
|
||||||
| showBarrierMeshes | ON | Verify visually before committing to long run |
|
nohup python3 experiments/exp24_generated_road_discrete.py > /tmp/exp24.out 2>&1 &
|
||||||
|
tail -f /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp24-discrete/run_*_discrete.log
|
||||||
## Success Criteria
|
```
|
||||||
|
|
||||||
- Car cannot drive past the barrier walls (verify visually)
|
|
||||||
- ep_len_mean should INCREASE over checkpoints (not frozen at 118)
|
|
||||||
- eval steps should improve at 20k, 30k, 40k checkpoints
|
|
||||||
- No evidence of outside-road circling in the reward curve
|
|
||||||
|
|
||||||
## Useful Commands
|
## Useful Commands
|
||||||
|
|
||||||
|
|
@ -124,12 +133,17 @@ tail -20 /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log
|
||||||
grep "Exiting batchmode\|Build failed\|error\|Error" /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log | tail -5
|
grep "Exiting batchmode\|Build failed\|error\|Error" /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log | tail -5
|
||||||
```
|
```
|
||||||
|
|
||||||
### Monitor exp23
|
### Monitor exp23 (while still running)
|
||||||
```bash
|
```bash
|
||||||
tail -f agent/models/exp23-generated-road-clean/run_*_clean.log
|
tail -f agent/models/exp23-generated-road-clean/run_2026-05-05_160718_clean.log
|
||||||
```
|
```
|
||||||
|
|
||||||
### Verify ports
|
### Monitor exp24
|
||||||
|
```bash
|
||||||
|
tail -f agent/models/exp24-discrete/run_*_discrete.log
|
||||||
|
```
|
||||||
|
|
||||||
|
### Verify port 9091
|
||||||
```bash
|
```bash
|
||||||
python3 - <<'PY'
|
python3 - <<'PY'
|
||||||
import socket
|
import socket
|
||||||
|
|
@ -141,9 +155,27 @@ for p in (9091,):
|
||||||
PY
|
PY
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Check exp23 progress
|
||||||
|
```bash
|
||||||
|
grep "Eval\|BEST" agent/models/exp23-generated-road-clean/run_2026-05-05_160718_clean.log | tail -20
|
||||||
|
```
|
||||||
|
|
||||||
|
## Success Criteria (exp24)
|
||||||
|
|
||||||
|
- Steering is visually smoother (7 discrete bins vs continuous Gaussian)
|
||||||
|
- Car stuck against barrier terminates within 2-3 seconds (speed check)
|
||||||
|
- Eval scores are more meaningful — each checkpoint tests a DIFFERENT road layout
|
||||||
|
- ep_len_mean should continue increasing from the baseline exp23 established
|
||||||
|
|
||||||
## Notes for Next Session
|
## Notes for Next Session
|
||||||
|
|
||||||
- If the user says `continue`, do not ask broad questions. Check build log → sync → launch → verify barriers → start exp23.
|
- Unity rebuild is required before exp24 — Car.cs raycast fix won't be in effect
|
||||||
- **Barrier visual confirmation is required before starting exp23.** Paul must see the translucent 3D boxes on both sides of the road with no gaps before committing to a 200k training run.
|
until the build is done and synced.
|
||||||
- The second sim (port 9093) is not needed for exp23 — only launch one sim.
|
- The second sim (port 9093) is not needed — only port 9091.
|
||||||
- Do not add generated_track back until generated_road training is verified working.
|
- Do NOT kill exp23 — let it run to completion.
|
||||||
|
- Exp24's road regeneration adds ~5s per checkpoint = ~100s extra total. This is
|
||||||
|
by design. The "Reconnecting for fresh road" log lines confirm it's working.
|
||||||
|
- `info['speed']` from telemetry = `rb.velocity.magnitude / 8.0`. The
|
||||||
|
`LOW_SPEED_THRESHOLD=0.5` corresponds to 4 Unity m/s, which is slow but not zero.
|
||||||
|
A truly stuck car reads ~0.0. Tight corners might temporarily be 0.1–0.3.
|
||||||
|
The 2-second timer provides enough grace for normal slow driving.
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue