diff --git a/DECISIONS.md b/DECISIONS.md index 2e37b68..4fd0567 100644 --- a/DECISIONS.md +++ b/DECISIONS.md @@ -315,3 +315,38 @@ GP tunes learning_rate — a higher LR will more aggressively overwrite speciali **Why:** Multiple fixes (90k step cap, checkpointing, rescue eval) were committed but the controller was never restarted. The fixes never ran. This caused several more timeouts that the fixes were meant to prevent. + +## ADR-017: Always Save the BEST Model During Training, Never Just the Latest + +**Date:** 2026-04-17 +**Status:** Active — enforced + +**Decision:** Every training script must save the best model found during +training, not just the final weights. Two mechanisms are approved: + +1. `train_multitrack()` in multitrack_runner.py — tracks `best_segment_reward`, + saves `best_model.zip` on every new high score, reloads it at the end. +2. SB3 `EvalCallback(best_model_save_path=..., deterministic=True)` for + standalone scripts. + +No training script may be written or run without one of these two mechanisms. + +**Why this matters:** +PPO policy weights can and do drift during long training runs. A model that +could drive at step 30k may be broken at step 90k. Saving only the final +weights throws away the best model found during training. + +**What was lost because this wasn't in place:** +- Wave 4 mountain_track Exp3/4/5: model was doing 20-second laps at step 30k. + Final model at step 90k crashed in 13 steps. Irrecoverable. +- Untold mid-training peaks across Wave 3 and Wave 4 that were never captured. + +**Root cause of the oversight:** +Phase 2 autoresearch used 13k-step trials on a simple single track. The +final model happened to be the best model (no time to drift). This false +assumption was carried forward into longer multi-track training where it +was wrong. The word "checkpoint" was misleading — we were saving the latest, +not the best. + +**Implementation:** See `train_multitrack()` in multitrack_runner.py — the +`best_segment_reward` tracking and `best_model.zip` save logic added 2026-04-17.