docs: ADR-017 — always save best model, never just latest

Documents the root cause of losing the mountain_track model that was
doing 20-second laps at step 30k but crashed at step 90k final eval.

Phase 2 (13k steps, simple track): final = best. Assumption carried
forward incorrectly into Wave 4 (90k steps, policy can drift).

Mandatory rule: every training script uses train_multitrack() best_model
tracking OR SB3 EvalCallback. No exceptions.

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
This commit is contained in:
Paul Huliganga 2026-04-17 16:03:59 -04:00
parent 4f77b8a468
commit fc01057c14
1 changed files with 35 additions and 0 deletions

View File

@ -315,3 +315,38 @@ GP tunes learning_rate — a higher LR will more aggressively overwrite speciali
**Why:** Multiple fixes (90k step cap, checkpointing, rescue eval) were committed but the controller was never restarted. The fixes never ran. This caused several more timeouts that the fixes were meant to prevent.
## ADR-017: Always Save the BEST Model During Training, Never Just the Latest
**Date:** 2026-04-17
**Status:** Active — enforced
**Decision:** Every training script must save the best model found during
training, not just the final weights. Two mechanisms are approved:
1. `train_multitrack()` in multitrack_runner.py — tracks `best_segment_reward`,
saves `best_model.zip` on every new high score, reloads it at the end.
2. SB3 `EvalCallback(best_model_save_path=..., deterministic=True)` for
standalone scripts.
No training script may be written or run without one of these two mechanisms.
**Why this matters:**
PPO policy weights can and do drift during long training runs. A model that
could drive at step 30k may be broken at step 90k. Saving only the final
weights throws away the best model found during training.
**What was lost because this wasn't in place:**
- Wave 4 mountain_track Exp3/4/5: model was doing 20-second laps at step 30k.
Final model at step 90k crashed in 13 steps. Irrecoverable.
- Untold mid-training peaks across Wave 3 and Wave 4 that were never captured.
**Root cause of the oversight:**
Phase 2 autoresearch used 13k-step trials on a simple single track. The
final model happened to be the best model (no time to drift). This false
assumption was carried forward into longer multi-track training where it
was wrong. The word "checkpoint" was misleading — we were saving the latest,
not the best.
**Implementation:** See `train_multitrack()` in multitrack_runner.py — the
`best_segment_reward` tracking and `best_model.zip` save logic added 2026-04-17.