Loads exp25 best_model (381r @ 80k) to skip early exploration. Runs 300k steps on generated_road with road regen every 10k steps. Python-side hit check is now active (added late in exp25, not loaded then). Final cross-model eval: exp26 best (9/10 full eps, 381.2r mean) — top performer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| ARCHIVED_reward_hacking/champion_hacked | ||
| champion | ||
| exp14-mountain-v5-finetune | ||
| exp20-parallel-450k-v5 | ||
| exp20-parallel-450k-v5_pre-fix_2026-04-28_163923 | ||
| exp21-generated-pair-warm-v4 | ||
| exp22-generated-pair-warm-v6 | ||
| exp23-generated-road-clean | ||
| exp26-warmstart | ||
| wave3-champion | ||
| wave4-champion | ||