diff --git a/agent/SESSION_HANDOFF.md b/agent/SESSION_HANDOFF.md index 2d21585..48024c5 100644 --- a/agent/SESSION_HANDOFF.md +++ b/agent/SESSION_HANDOFF.md @@ -1,6 +1,6 @@ # RL Donkeycar Session Handoff -Last updated: 2026-05-05 America/Toronto +Last updated: 2026-05-05 America/Toronto (updated during exp23) ## Autonomy Instruction @@ -12,12 +12,11 @@ If the user says only `continue`, interpret it using the instruction above. ## Current Goal -Run a clean, trustworthy exp23 on `generated_road` with: -- Solid BoxCollider barriers (car physically cannot escape) -- Clean reward: speed × CTE_quality + efficiency gate -- No artificial episode caps or Python-side exploit patches - -Get RL training producing genuine improvement again. +Run exp24 on `generated_road` with: +- Discrete steering (7 bins) — smoother, less oscillation than continuous +- Speed-based stuck detection — catches car pressed nose-first against barrier +- Road regeneration — sim reconnects between segments so each eval is a fresh road +- Unity raycast fix — Car.cs detects nose-first barrier contact via forward raycast ## Important Paths @@ -39,82 +38,92 @@ Unity build log: ## What Was Fixed This Session -### Root cause identified and fixed +### Barrier physics (previous session, still in effect) -**The car was escaping the track because:** -1. Barriers were zero-thickness `MeshCollider` planes — no physical volume -2. Car Rigidbody had no CCD — default `Discrete` mode allows tunneling +- `RoadBuilder.cs`: BoxCollider per segment with overlap to close corner gaps +- `Car.cs`: CCD mode prevents tunneling through thin geometry +- `showBarrierMeshes=false` in scene YAML files (barriers invisible to RL camera) -Both problems created a simulator where the car could literally teleport through -barrier walls between physics frames. Every Python-side "fix" (CTE termination, -time caps, hit detection) was attempting in Python what the physics engine was -failing to enforce. +### Nose-first stuck detection root cause identified and fixed -### Unity changes (source updated, build in progress) +**Why collision detection never fires when perpendicular to barrier:** -`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/RoadBuilder.cs` -- Rewrote `CreateBarrier()`: now creates one `BoxCollider` per segment with real - 3D volume (`barrierThickness` wide — default 1.0m) -- Segment boxes overlap by `barrierThickness * 0.5` to close corner gaps -- Added `CreateEndCap()`: seals the two open ends of non-looping tracks - (`generated_road` is `closeLoop=0` — without end caps the car can drive off - the ends of the track) -- Added `public float barrierThickness = 1.0f` field (inspector-editable) -- `showBarrierMeshes=true` now shows proper translucent 3D boxes, not flat planes +Unity `WheelColliders` (used for suspension/steering) don't fire `OnCollisionEnter` +or `OnCollisionStay` on the car's `MonoBehaviour`. When the car is nose-first into +a barrier, only the front WHEELS make physical contact. The car BODY collider (which +`Car.cs`'s callbacks are attached to) is slightly behind the wheels and never touches. +At an angle, the car body eventually makes contact — which is why collision detection +works for side-contact but NOT for perpendicular contact. + +### Car.cs: forward raycast fix `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/Car.cs` -- Added `rb.collisionDetectionMode = CollisionDetectionMode.Continuous;` in - `Awake()` — prevents tunneling even against any remaining thin geometry +- Added short forward raycast in `FixedUpdate()`: + - When `requestTorque > 0.05f` (throttle applied) + - Casts from car center + 0.3m up, 0.8m forward + - Calls `RegisterCollision()` if anything hit + - This fires whether wheels or body made contact +- **Requires a Unity rebuild + sync before exp24** -### Python changes (committed) +### StuckTerminationWrapper: speed-based check -`agent/reward_wrapper.py` → v7 (clean) -- REMOVED: CTE-patience termination, high-CTE negative reward, solid_hit - monitoring, low-speed/wedge detection, all exploit-closing bandaids -- KEPT: efficiency gate (zero reward when circling), no-progress termination - (active_node), lap exploit guard -- Reward: `speed_norm × CTE_quality` when efficiency passes gate +`agent/multitrack_runner.py` — new params `low_speed_threshold`, `max_low_speed_seconds`: +- If `info['speed'] < 0.5` for 2 wall-clock seconds → terminate +- Catches car pressed against barrier regardless of lateral sliding +- `info['speed']` = `rb.velocity.magnitude / 8.0` in Unity → stuck car = ~0 +- Smoke-tested and committed -`agent/experiments/exp23_generated_road_clean.py` -- Single track: `generated_road` on port 9091 -- No warm-start (fresh PPO weights) -- `MAX_EPISODE_SECONDS=120` (generous safety net, not a training constraint) -- LR=0.0003, 200k total steps, checkpoints every 10k +### Exp24: discrete steering + road regeneration -`tests/test_reward_wrapper.py` — 17 tests, all pass +`agent/experiments/exp24_generated_road_discrete.py`: +- Action space: `Discrete(7)` via `DiscretizedActionWrapper(n_steer=7, n_throttle=1)` +- Steering bins: -1, -0.67, -0.33, 0, 0.33, 0.67, 1 (throttle fixed at 0.2) +- **Road regeneration**: after each 10k-step segment, `env.close()` + reconnect + - Reconnecting reloads the scene → sdsandbox generates new random road + - Each eval runs on a freshly generated road (proper generalization test) + - ~5s overhead per checkpoint = ~100s total for 200k run +- Speed-based stuck: `LOW_SPEED_THRESHOLD=0.5`, `MAX_LOW_SPEED_SECONDS=2.0` +- `MAX_EPISODE_SECONDS=30` (reduced from 120s — speed check handles barrier stuck) +- LR=0.0003, 200k total steps + +### Road generation clarification + +The road is NOT regenerated each episode reset. `generated_road` creates a fixed +random layout when the scene LOADS (i.e., when you `gym.make()`). Within a session, +all episodes use the same road. The exp23 eval variance (107–1951 steps) was due +to Unity physics non-determinism, NOT road variety. + +### Reward wrapper + +`agent/reward_wrapper.py` v7 — unchanged, still in effect: +- Reward: `speed_norm × CTE_quality` gated by efficiency +- No Python-side exploit bandaids (physics enforces containment) ## Current State -### Exp 23 is RUNNING -- PID: 647921 -- Log: `agent/models/exp23-generated-road-clean/run_2026-05-05_160348_clean.log` -- Started: 2026-05-05 16:03 -- Barriers visually confirmed by Paul before launch +### Exp 23 status +- Running (PID 649531), at ~140k/200k steps as of last check +- Will finish on its own — DO NOT kill it +- Log: `agent/models/exp23-generated-road-clean/run_2026-05-05_160718_clean.log` -### Build and sync status -- Unity build: completed successfully 2026-05-05 15:57 -- Both runtime folders synced -- Sim on port 9091 running (generated_road) -- Port 9093 / second sim NOT needed for exp23 +### Unity build status +- **Needs rebuild** — Car.cs raycast fix not yet compiled +- Car.cs was modified at: + `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/Car.cs` -## Key Parameters (exp23) +### To launch exp24 -| Setting | Value | Why | -|---|---|---| -| Track | generated_road | Single track — diagnose before adding second | -| LR | 0.0003 | Standard PPO starting LR | -| Total steps | 200k | More room to learn with clean signal | -| max_episode_seconds | 120s | Safety net only — physics does the work | -| MAX_CTE_TERMINATE | none | Removed — barriers are physical now | -| Warm-start | none | Previous warm-starts trained on broken reward | -| showBarrierMeshes | ON | Verify visually before committing to long run | - -## Success Criteria - -- Car cannot drive past the barrier walls (verify visually) -- ep_len_mean should INCREASE over checkpoints (not frozen at 118) -- eval steps should improve at 20k, 30k, 40k checkpoints -- No evidence of outside-road circling in the reward curve +1. Wait for exp23 to finish (or confirm it has) +2. Rebuild Unity (Car.cs raycast fix) +3. Stop sim on port 9091 +4. Rsync build to runtime folders (both, or just the one on 9091) +5. Restart sim on port 9091 +6. Launch exp24: + ```bash + cd /home/paulh/projects/donkeycar-rl-autoresearch/agent + nohup python3 experiments/exp24_generated_road_discrete.py > /tmp/exp24.out 2>&1 & + tail -f /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp24-discrete/run_*_discrete.log + ``` ## Useful Commands @@ -124,12 +133,17 @@ tail -20 /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log grep "Exiting batchmode\|Build failed\|error\|Error" /mnt/c/Users/Paul/AppData/Local/Temp/unity_rebuild.log | tail -5 ``` -### Monitor exp23 +### Monitor exp23 (while still running) ```bash -tail -f agent/models/exp23-generated-road-clean/run_*_clean.log +tail -f agent/models/exp23-generated-road-clean/run_2026-05-05_160718_clean.log ``` -### Verify ports +### Monitor exp24 +```bash +tail -f agent/models/exp24-discrete/run_*_discrete.log +``` + +### Verify port 9091 ```bash python3 - <<'PY' import socket @@ -141,9 +155,27 @@ for p in (9091,): PY ``` +### Check exp23 progress +```bash +grep "Eval\|BEST" agent/models/exp23-generated-road-clean/run_2026-05-05_160718_clean.log | tail -20 +``` + +## Success Criteria (exp24) + +- Steering is visually smoother (7 discrete bins vs continuous Gaussian) +- Car stuck against barrier terminates within 2-3 seconds (speed check) +- Eval scores are more meaningful — each checkpoint tests a DIFFERENT road layout +- ep_len_mean should continue increasing from the baseline exp23 established + ## Notes for Next Session -- If the user says `continue`, do not ask broad questions. Check build log → sync → launch → verify barriers → start exp23. -- **Barrier visual confirmation is required before starting exp23.** Paul must see the translucent 3D boxes on both sides of the road with no gaps before committing to a 200k training run. -- The second sim (port 9093) is not needed for exp23 — only launch one sim. -- Do not add generated_track back until generated_road training is verified working. +- Unity rebuild is required before exp24 — Car.cs raycast fix won't be in effect + until the build is done and synced. +- The second sim (port 9093) is not needed — only port 9091. +- Do NOT kill exp23 — let it run to completion. +- Exp24's road regeneration adds ~5s per checkpoint = ~100s extra total. This is + by design. The "Reconnecting for fresh road" log lines confirm it's working. +- `info['speed']` from telemetry = `rb.velocity.magnitude / 8.0`. The + `LOW_SPEED_THRESHOLD=0.5` corresponds to 4 Unity m/s, which is slow but not zero. + A truly stuck car reads ~0.0. Tight corners might temporarily be 0.1–0.3. + The 2-second timer provides enough grace for normal slow driving.