Root cause: barriers were zero-thickness MeshCollider planes with no CCD on the
car. The car tunnelled through between frames. Every Python patch was trying to
catch in code what physics should enforce.
Unity (source only — build in progress):
- RoadBuilder.cs: CreateBarrier() now makes BoxCollider-per-segment with real 3D
volume (barrierThickness=1.0m default) + half-thickness overlap at corners to
seal gaps. CreateEndCap() seals open ends of non-looping tracks (generated_road).
- Car.cs: rb.collisionDetectionMode = Continuous in Awake() — prevents tunneling.
Python:
- reward_wrapper.py v7: removed CTE-patience termination, high-CTE negative
reward, solid_hit monitoring, low-speed/wedge detection. Kept: efficiency gate,
no-progress (active_node) termination, lap exploit guard. Reward = speed×CTE_quality.
- exp23_generated_road_clean.py: single track, no warm-start, 200k steps, clean
reward, MAX_EPISODE_SECONDS=120 as safety net only.
- tests: 17 tests covering clean reward properties.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- reward_wrapper: detect barrier/wall/tree solid hits, terminate on head-on impact
or 4 sustained solid-hit frames; prevents car wedging against invisible barriers
- reward_wrapper: add low-speed/wedge termination — kills episode when car is pinned
motionless (below threshold, no displacement) after grace period
- reward_wrapper: high-CTE exploit fix — return -0.25 immediately when CTE >
max_cte_terminate (not after patience), so PPO cannot collect positive speed
rewards while driving the large outside-road circle
- tests: 23 passing unit tests covering all new termination paths
- exp20/21/22: add parallel DummyVecEnv experiments on generated_road+generated_track
with warm-start from champion model; exp22 is current active run
- SESSION_HANDOFF.md: live handoff doc for next session continuity
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
StuckTerminationWrapper wall-clock timer was resettable by barrier-sliding:
car drifting 0.5m along a wall repeatedly resets the 12s timer. At low sim
fps (1-2fps when both cars stuck), 40-step check also takes minutes.
Fix: added max_episode_seconds=30 — hard wall-clock limit per episode,
independent of position or sim fps. No episode can run longer than 30s.
Also adds monitor_training.sh: independent shell process that checks every
5 minutes and appends status to /tmp/training_monitor.log — works without
Claude being active.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Exp 17 post-mortem: efficiency gate window=30 steps only covers ~40% of a
3.5s exploit circle at 22fps, giving partial-arc efficiency ~0.77 (gate fires
at 0.15). Car earned positive reward while circling, outweighing the -10
lap penalty. Performance peaked at 80k then collapsed.
Exp 18 fixes:
- window_size 30→200: covers 2+ full exploit circles, driving efficiency→0
- min_lap_time 5s→12s: genuine laps are 13-16s (gentrack) and 27-29s (mountain);
anything under 12s is an exploit and terminates immediately
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- exp17_parallel_450k.py: parallel two-track training (generated_track:9091,
mountain_track:9093), 450k steps, v6 reward, HOST=localhost
- DECISIONS.md: ADR-025 (parallel strategy) and ADR-026 (mountain friction fix)
- docs/STATE.md: updated to April 2026 state with current champions and strategy
- docs/TEST_HISTORY.md: mountain friction fix notes + Exp 17 full design
- outerloop-results: exp14 finetune logs and robust mountain eval results
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v5 required for mountain hills (v4 gives zero gradient on hills - documented Exp 1).
Same simple approach as Exp 13 which worked: single track, minimal wrappers,
lap-based stopping. ThrottleClamp + V5Reward only.
Return to Wave 4 setup that produced Trial 9 (2000/2000 on generated_track).
v4 reward: base x efficiency x speed. Circles give ~0 reward naturally.
No StuckTerminationWrapper, no CTE patience, no progress terminator.
Just ThrottleClamp + V4Reward. Lap-based stopping criterion.
User's insight: a circling car stays near the same track waypoints, so
active_node (sim's track progress indicator) never advances. Track the
maximum active_node reached this episode. If it hasn't increased in
progress_patience=60 steps (~3.3s), terminate.
This catches:
- Circular driving (active_node oscillates, max never increases)
- Stuck on cone/barrier (active_node frozen)
- NOT triggered by: legitimate cornering, slow forward progress, lap resets
On lap completion, active_node wraps to 0 — reset max_node_seen and counter.
Also: Exp 12 — single track mountain training with lap-based stopping criterion.
Train until 3 consecutive laps in eval, not fixed step count.
Critical facts documented permanently:
- throttle_min=0.5 bakes into action space (too fast for corners)
- throttle_min=0.2 + v5 reward CAN learn hill (proved Exp 9, mountain only 90k)
- Mountain failure in parallel is contamination from grass exploit, not throttle
- Grass exploit root cause: sim determine_episode_over() passes when CTE>16m
- DO NOT confuse mountain rollback with stuck issue
- DO NOT change throttle_min as first response to mountain failure
v5 dropped the efficiency term to get gradient signal on hills, but this
re-enabled circular driving (observed in Exp 11). v6 adds efficiency back
as a GATE (not multiplier): if efficiency < 0.15, reward = 0. Otherwise
reward = speed × CTE_quality (same as v5).
Gate vs multiplier: v4 used efficiency as a multiplier which killed gradient
on hills (all terms → 0 simultaneously). v6's gate passes when efficiency
is above threshold (car moving forward, even slowly on hill) and only
blocks when car is truly circling.
Also reduced stuck_steps from 80 to 40 (~2.5s vs ~5s) — user reported
car stuck against barriers for ~10s which is too long with DummyVecEnv.
Scripts in /tmp are lost on reboot and not reproducible.
All experiment scripts now committed to git with README.
Exp5 script was already gone (lost before this fix).
All others (Exp6-Exp10, overnight, wave5, etc.) now preserved.
Rule going forward: scripts saved to agent/experiments/ and committed
BEFORE running, not after.
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A