- exp17_parallel_450k.py: parallel two-track training (generated_track:9091,
mountain_track:9093), 450k steps, v6 reward, HOST=localhost
- DECISIONS.md: ADR-025 (parallel strategy) and ADR-026 (mountain friction fix)
- docs/STATE.md: updated to April 2026 state with current champions and strategy
- docs/TEST_HISTORY.md: mountain friction fix notes + Exp 17 full design
- outerloop-results: exp14 finetune logs and robust mountain eval results
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Exp 11 (v5 reward): aborted at 66k — circular driving returned without efficiency term
Exp 11b (v6 reward): completed 90k — no circles but plateaus at 170-195 steps
All 4 tracks eval: remarkably consistent ~194 steps (including zero-shot)
Parallel DummyVecEnv infrastructure proven stable.
Next: increase training budget (90k may be insufficient for 2 parallel envs).
Every test run now saves to agent/test-results/YYYY-MM-DD_HH-MM_<model>.log
so results are never lost. Also added 3-set Exp9 eval results to TEST_HISTORY.
Usage:
python3 agent/run_eval.py --model models/exp9-.../best_model.zip --sets 3
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
Exp8 results: 567 reward peak at step 60k, policy diverged after.
Best_model correctly saved. mini_monaco crashed at 91 steps (mean)
at same corner every time — throttle min=0.5 baked into action space.
Exp9 plan: throttle_min=0.2, v5 reward unchanged. Tests hypothesis
that v5 gradient is sufficient for hill without forced 0.5 minimum.
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
Every mountain track experiment (Exp1-8) and Wave 4 trials documented:
- What was changed from previous test
- Key observation from simulator
- Root cause of failure
- What was learned
Also documents: what we keep, open problems, next steps.
Exp 8 currently running (PID 2941877).
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A