donkeycar-rl-autoresearch

History

Paul Huliganga 5d1227833d fix: close short-lap circle exploit and cap segment eval episode length Two reward hacking behaviours observed during Wave 4 training: 1. Short-lap circle exploit (reported by user, echoes Toni's guardrail hack): Model circles at start/finish line completing laps in 1-2 sim-seconds, accumulating lap_count indefinitely with no genuine track progress. Fix: SpeedRewardWrapper detects lap_count increment; if last_lap_time < min_lap_time (5.0s), returns penalty = -10 × (min_lap_time / lap_time). A 1-second lap gives -50 penalty. Legitimate 12-second laps unaffected. Window size also increased from 30 → 60 to catch slower circles. 2. Non-terminating segment eval episodes: evaluate_policy on wide tracks (no barriers) could run indefinitely, inflating segment_reward to 200k+. Replaced with manual eval loop capped at MAX_EVAL_STEPS=3000 steps. Phase 4 results cleared (trials 4-6 ran with exploitable reward). Tests: 4 new reward wrapper tests, 100 total passing. Agent: pi Tests: 100 passed Tests-Added: 4 TypeScript: N/A		2026-04-15 09:06:25 -04:00
..
__init__.py	feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing	2026-04-13 10:03:15 -04:00
test_autoresearch_controller.py	fix: reward v4 — full sim bypass kills circular driving at root	2026-04-13 20:56:32 -04:00
test_behavioral_wrappers.py	feat: Phase 3 — behavioral control, enhanced evaluator, 53 tests	2026-04-14 09:28:43 -04:00
test_discretize_action.py	feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing	2026-04-13 10:03:15 -04:00
test_end_to_end.py	Wave 4: scratch training on generated_track + mountain_track, zero-shot mini_monaco	2026-04-14 22:40:38 -04:00
test_reward_wrapper.py	fix: close short-lap circle exploit and cap segment eval episode length	2026-04-15 09:06:25 -04:00
test_runner_integration.py	feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing	2026-04-13 10:03:15 -04:00
test_wave3.py	Wave 4: scratch training on generated_track + mountain_track, zero-shot mini_monaco	2026-04-14 22:40:38 -04:00