Commit Graph

2 Commits

Author SHA1 Message Date
Paul Huliganga 2d52bb4ffc fix(core): replace exploit bandaids with solid physics barriers + clean reward
Root cause: barriers were zero-thickness MeshCollider planes with no CCD on the
car. The car tunnelled through between frames. Every Python patch was trying to
catch in code what physics should enforce.

Unity (source only — build in progress):
- RoadBuilder.cs: CreateBarrier() now makes BoxCollider-per-segment with real 3D
  volume (barrierThickness=1.0m default) + half-thickness overlap at corners to
  seal gaps. CreateEndCap() seals open ends of non-looping tracks (generated_road).
- Car.cs: rb.collisionDetectionMode = Continuous in Awake() — prevents tunneling.

Python:
- reward_wrapper.py v7: removed CTE-patience termination, high-CTE negative
  reward, solid_hit monitoring, low-speed/wedge detection. Kept: efficiency gate,
  no-progress (active_node) termination, lap exploit guard. Reward = speed×CTE_quality.
- exp23_generated_road_clean.py: single track, no warm-start, 200k steps, clean
  reward, MAX_EPISODE_SECONDS=120 as safety net only.
- tests: 17 tests covering clean reward properties.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 15:56:00 -04:00
Paul Huliganga 138c65270f feat(exp22): add solid-hit/wedge/high-CTE exploit fixes and generated-pair warm experiments
- reward_wrapper: detect barrier/wall/tree solid hits, terminate on head-on impact
  or 4 sustained solid-hit frames; prevents car wedging against invisible barriers
- reward_wrapper: add low-speed/wedge termination — kills episode when car is pinned
  motionless (below threshold, no displacement) after grace period
- reward_wrapper: high-CTE exploit fix — return -0.25 immediately when CTE >
  max_cte_terminate (not after patience), so PPO cannot collect positive speed
  rewards while driving the large outside-road circle
- tests: 23 passing unit tests covering all new termination paths
- exp20/21/22: add parallel DummyVecEnv experiments on generated_road+generated_track
  with warm-start from champion model; exp22 is current active run
- SESSION_HANDOFF.md: live handoff doc for next session continuity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 14:46:13 -04:00