Commit Graph

4 Commits

Author SHA1 Message Date
Paul Huliganga 0da04327ef docs: capture robust mountain finetune winner at 36k and preserve eval comparison 2026-04-20 00:43:27 -04:00
Paul Huliganga f730a2e0ba docs: ADR-020/021 + session log — throttle/hill history and grass exploit root cause
Critical facts documented permanently:
- throttle_min=0.5 bakes into action space (too fast for corners)
- throttle_min=0.2 + v5 reward CAN learn hill (proved Exp 9, mountain only 90k)
- Mountain failure in parallel is contamination from grass exploit, not throttle
- Grass exploit root cause: sim determine_episode_over() passes when CTE>16m
- DO NOT confuse mountain rollback with stuck issue
- DO NOT change throttle_min as first response to mountain failure
2026-04-19 16:14:28 -04:00
Paul Huliganga 0993d4f1e7 docs: Exp 11 + 11b results — parallel envs work, v6 prevents circles, but plateaus at ~194 steps
Exp 11 (v5 reward): aborted at 66k — circular driving returned without efficiency term
Exp 11b (v6 reward): completed 90k — no circles but plateaus at 170-195 steps
All 4 tracks eval: remarkably consistent ~194 steps (including zero-shot)
Parallel DummyVecEnv infrastructure proven stable.
Next: increase training budget (90k may be insufficient for 2 parallel envs).
2026-04-19 13:26:29 -04:00
Paul Huliganga 86357622e3 docs: session log + ADR-019 — parallel DummyVecEnv for multi-track training 2026-04-19 10:50:11 -04:00