donkeycar-rl-autoresearch/agent/models
Paul Huliganga 8de4838c6b feat(exp26): warm-start training from exp25 best_model (300k steps)
Loads exp25 best_model (381r @ 80k) to skip early exploration. Runs 300k
steps on generated_road with road regen every 10k steps. Python-side hit
check is now active (added late in exp25, not loaded then). Final cross-model
eval: exp26 best (9/10 full eps, 381.2r mean) — top performer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 15:32:16 -04:00
..
ARCHIVED_reward_hacking/champion_hacked fix: hack-proof reward shaping + reward hacking detection + research log 2026-04-13 12:27:48 -04:00
champion feat: Phase 3 — behavioral control, enhanced evaluator, 53 tests 2026-04-14 09:28:43 -04:00
exp14-mountain-v5-finetune docs: capture robust mountain finetune winner at 36k and preserve eval comparison 2026-04-20 00:43:27 -04:00
exp20-parallel-450k-v5 feat(exp22): add solid-hit/wedge/high-CTE exploit fixes and generated-pair warm experiments 2026-05-05 14:46:13 -04:00
exp20-parallel-450k-v5_pre-fix_2026-04-28_163923 feat(exp22): add solid-hit/wedge/high-CTE exploit fixes and generated-pair warm experiments 2026-05-05 14:46:13 -04:00
exp21-generated-pair-warm-v4 feat(exp22): add solid-hit/wedge/high-CTE exploit fixes and generated-pair warm experiments 2026-05-05 14:46:13 -04:00
exp22-generated-pair-warm-v6 chore(exp22): update wedgefix run log — training stopped for strategy rethink 2026-05-05 15:36:18 -04:00
exp23-generated-road-clean chore(exp23): launched — clean barriers verified, training started 2026-05-05 16:04:21 -04:00
exp26-warmstart feat(exp26): warm-start training from exp25 best_model (300k steps) 2026-05-14 15:32:16 -04:00
wave3-champion wave3: autoresearch trial 5 results 2026-04-14 18:22:44 -04:00
wave4-champion wave3: autoresearch trial 5 results 2026-04-15 07:15:57 -04:00