donkeycar-rl-autoresearch

History

Paul Huliganga 0fbd15a941 eval: multi-track generalization test — all 3 models drive new road + generated track New generated road course (different random layout): Trial-20: 2441 reward, 2206 steps, osc=0.029, RIGHT lane ✅ Trial-8: 2351 reward, 2922 steps, osc=0.295, RIGHT lane ✅ Trial-18: 2031 reward, 2214 steps, osc=0.032, LEFT lane ✅ Generated track course (completely different environment/visuals): Trial-20: 2443 reward, 2207 steps, osc=0.030, RIGHT lane ✅ Trial-8: 2317 reward, 2868 steps, osc=0.284, RIGHT lane ✅ Trial-18: 2033 reward, 2216 steps, osc=0.032, LEFT lane ✅ KEY FINDING: All models show IDENTICAL behaviour patterns across ALL 3 tracks: - Same oscillation scores (within 2%) - Same lane preferences preserved across tracks - Same step counts and rewards This proves GENUINE GENERALISATION — not track memorisation! Also: Added --env flag to evaluate_champion.py for multi-track evaluation Agent: pi/claude-sonnet Tests: 53/53 passing Tests-Added: 0 TypeScript: N/A		2026-04-14 09:50:28 -04:00
..
model-000	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
model-001	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
model-002	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
model-003	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
autoresearch_log.txt	AUTORESEARCH: 300 total trials complete - best mean_reward=141.85 at n_steer=8, n_throttle=5, lr=0.00202	2026-04-13 01:56:06 -04:00
autoresearch_phase1_log.txt	milestone: Phase 1 complete — genuine driving confirmed; launch Phase 2 corner learning	2026-04-13 19:33:06 -04:00
autoresearch_phase1_log_CORRUPTED_circular_driving.txt	fix: path-efficiency reward (v3) defeats circular driving exploit	2026-04-13 13:36:17 -04:00
autoresearch_phase1_log_CORRUPTED_reward_hacking.txt	fix: hack-proof reward shaping + reward hacking detection + research log	2026-04-13 12:27:48 -04:00
autoresearch_phase2_log.txt	feat: Phase 3 — behavioral control, enhanced evaluator, 53 tests	2026-04-14 09:28:43 -04:00
autoresearch_results.jsonl	AUTORESEARCH: 300 total trials complete - best mean_reward=141.85 at n_steer=8, n_throttle=5, lr=0.00202	2026-04-13 01:56:06 -04:00
autoresearch_results_phase1.jsonl	autoresearch: phase1 trial 50 results	2026-04-13 19:17:56 -04:00
autoresearch_results_phase1_CORRUPTED_circular_driving.jsonl	fix: path-efficiency reward (v3) defeats circular driving exploit	2026-04-13 13:36:17 -04:00
autoresearch_results_phase1_CORRUPTED_reward_hacking.jsonl	fix: hack-proof reward shaping + reward hacking detection + research log	2026-04-13 12:27:48 -04:00
autoresearch_results_phase2.jsonl	autoresearch: phase1 trial 20 results	2026-04-14 04:35:45 -04:00
clean_sweep_results.jsonl	AUTORESEARCH: Full Karpathy-style GP+UCB meta-controller, clean base data, fixed all paths, ready to run	2026-04-13 00:52:00 -04:00
eval_summary.jsonl	eval: multi-track generalization test — all 3 models drive new road + generated track	2026-04-14 09:50:28 -04:00
nohup_outerloop.log	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
outer_monitor.log	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
sweep_results.jsonl	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00