donkeycar-rl-autoresearch

History

Paul Huliganga b504b89b2a feat: add exp17 parallel DummyVecEnv 450k training + strategy docs - exp17_parallel_450k.py: parallel two-track training (generated_track:9091, mountain_track:9093), 450k steps, v6 reward, HOST=localhost - DECISIONS.md: ADR-025 (parallel strategy) and ADR-026 (mountain friction fix) - docs/STATE.md: updated to April 2026 state with current champions and strategy - docs/TEST_HISTORY.md: mountain friction fix notes + Exp 17 full design - outerloop-results: exp14 finetune logs and robust mountain eval results Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-04-28 02:42:20 -04:00
..
model-000	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
model-001	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
model-002	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
model-003	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
autoresearch_log.txt	AUTORESEARCH: 300 total trials complete - best mean_reward=141.85 at n_steer=8, n_throttle=5, lr=0.00202	2026-04-13 01:56:06 -04:00
autoresearch_phase1_log.txt	milestone: Phase 1 complete — genuine driving confirmed; launch Phase 2 corner learning	2026-04-13 19:33:06 -04:00
autoresearch_phase1_log_CORRUPTED_circular_driving.txt	fix: path-efficiency reward (v3) defeats circular driving exploit	2026-04-13 13:36:17 -04:00
autoresearch_phase1_log_CORRUPTED_reward_hacking.txt	fix: hack-proof reward shaping + reward hacking detection + research log	2026-04-13 12:27:48 -04:00
autoresearch_phase2_log.txt	fix: multitrack_runner must use VecTransposeImage(DummyVecEnv) not plain wrap_env	2026-04-18 18:33:40 -04:00
autoresearch_phase3_log.txt	fix: multitrack_runner must use VecTransposeImage(DummyVecEnv) not plain wrap_env	2026-04-18 18:33:40 -04:00
autoresearch_phase4_log.txt	feat: v5 reward — speed × CTE-quality, drop efficiency term	2026-04-17 13:25:38 -04:00
autoresearch_results.jsonl	AUTORESEARCH: 300 total trials complete - best mean_reward=141.85 at n_steer=8, n_throttle=5, lr=0.00202	2026-04-13 01:56:06 -04:00
autoresearch_results_phase1.jsonl	autoresearch: phase1 trial 50 results	2026-04-13 19:17:56 -04:00
autoresearch_results_phase1_CORRUPTED_circular_driving.jsonl	fix: path-efficiency reward (v3) defeats circular driving exploit	2026-04-13 13:36:17 -04:00
autoresearch_results_phase1_CORRUPTED_reward_hacking.jsonl	fix: hack-proof reward shaping + reward hacking detection + research log	2026-04-13 12:27:48 -04:00
autoresearch_results_phase2.jsonl	autoresearch: phase1 trial 20 results	2026-04-14 04:35:45 -04:00
autoresearch_results_phase3.jsonl	Wave 4: scratch training on generated_track + mountain_track, zero-shot mini_monaco	2026-04-14 22:40:38 -04:00
autoresearch_results_phase3_CONTAMINATED_v2.jsonl	fix: complete LR override — must patch lr_schedule, not just param_groups	2026-04-14 21:27:43 -04:00
autoresearch_results_phase3_CONTAMINATED_wrong_lr.jsonl	fix: LR override was not reaching the optimizer — all trials ran at 0.000225	2026-04-14 20:37:48 -04:00
autoresearch_results_phase4.jsonl	wave3: autoresearch trial 25 results	2026-04-16 20:01:51 -04:00
clean_sweep_results.jsonl	AUTORESEARCH: Full Karpathy-style GP+UCB meta-controller, clean base data, fixed all paths, ready to run	2026-04-13 00:52:00 -04:00
eval_summary.jsonl	fix: track switching via unwrapped viewer.exit_scene() — automatic scene changes work	2026-04-14 10:04:15 -04:00
exp14_finetune_log.txt	feat: add exp17 parallel DummyVecEnv 450k training + strategy docs	2026-04-28 02:42:20 -04:00
exp14_finetune_results.jsonl	feat: add exp17 parallel DummyVecEnv 450k training + strategy docs	2026-04-28 02:42:20 -04:00
mountain_candidate_eval_2026-04-19.jsonl	docs: capture robust mountain finetune winner at 36k and preserve eval comparison	2026-04-20 00:43:27 -04:00
mountain_candidate_eval_2026-04-19.md	docs: capture robust mountain finetune winner at 36k and preserve eval comparison	2026-04-20 00:43:27 -04:00
multitrack_results.jsonl	results: complete multi-track generalization baseline — 1/10 tracks drivable pre-Wave3	2026-04-14 11:31:08 -04:00
nohup_outerloop.log	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
outer_monitor.log	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
robust_eval_mountain.jsonl	feat: add exp17 parallel DummyVecEnv 450k training + strategy docs	2026-04-28 02:42:20 -04:00
sweep_results.jsonl	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00