- reward_wrapper: detect barrier/wall/tree solid hits, terminate on head-on impact or 4 sustained solid-hit frames; prevents car wedging against invisible barriers - reward_wrapper: add low-speed/wedge termination — kills episode when car is pinned motionless (below threshold, no displacement) after grace period - reward_wrapper: high-CTE exploit fix — return -0.25 immediately when CTE > max_cte_terminate (not after patience), so PPO cannot collect positive speed rewards while driving the large outside-road circle - tests: 23 passing unit tests covering all new termination paths - exp20/21/22: add parallel DummyVecEnv experiments on generated_road+generated_track with warm-start from champion model; exp22 is current active run - SESSION_HANDOFF.md: live handoff doc for next session continuity Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| ARCHIVED_reward_hacking/champion_hacked | ||
| champion | ||
| exp14-mountain-v5-finetune | ||
| exp20-parallel-450k-v5 | ||
| exp20-parallel-450k-v5_pre-fix_2026-04-28_163923 | ||
| exp21-generated-pair-warm-v4 | ||
| exp22-generated-pair-warm-v6 | ||
| wave3-champion | ||
| wave4-champion | ||