donkeycar-rl-autoresearch

History

Paul Huliganga b8a13dea81 feat: v5 reward — speed × CTE-quality, drop efficiency term Problem with v4 on mountain_track: CTE × efficiency × speed all collapse to zero simultaneously when the car slows on the hill, giving no gradient signal for 'apply more throttle'. v5: reward = (speed / 10) × (1 - \|CTE\| / max_cte) - Directly rewards going fast while staying centred - Hill: car slows → reward drops → clear gradient toward more throttle - Circling protection now entirely handled by lap-time penalty + StuckTerminationWrapper (not by the reward formula) Tests updated to reflect v5 semantics (102 passing). Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A		2026-04-17 13:25:38 -04:00
..
__init__.py	feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing	2026-04-13 10:03:15 -04:00
test_autoresearch_controller.py	fix: reward v4 — full sim bypass kills circular driving at root	2026-04-13 20:56:32 -04:00
test_behavioral_wrappers.py	feat: Phase 3 — behavioral control, enhanced evaluator, 53 tests	2026-04-14 09:28:43 -04:00
test_discretize_action.py	feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing	2026-04-13 10:03:15 -04:00
test_end_to_end.py	Wave 4: scratch training on generated_track + mountain_track, zero-shot mini_monaco	2026-04-14 22:40:38 -04:00
test_reward_wrapper.py	feat: v5 reward — speed × CTE-quality, drop efficiency term	2026-04-17 13:25:38 -04:00
test_runner_integration.py	feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing	2026-04-13 10:03:15 -04:00
test_wave3.py	fix: StuckTerminationWrapper + deque import + 102 tests	2026-04-15 09:17:27 -04:00