diff --git a/docs/TEST_HISTORY.md b/docs/TEST_HISTORY.md index cffbbac..07f0dc2 100644 --- a/docs/TEST_HISTORY.md +++ b/docs/TEST_HISTORY.md @@ -154,3 +154,44 @@ All experiments: mountain_track only, lr=0.000725, throttle_min varies, 90k step 3. **Test combined model** on all 4 tracks — can it generalise to mini_monaco like Trial 9 did? 4. **If yes:** We have reproduced Trial 9 reliably with a better reward function + +### Exp 8 — Mountain track, v5 reward, throttle_min=0.5, CORRECT checkpointing ✅ COMPLETED +- **Reward:** v5 (speed × CTE-quality) +- **throttle_min:** 0.5 +- **Method:** Direct model.learn() loop, single TCP connection, NO close_and_switch +- **Steps:** 90,000 total | 6,000 per segment | 15 checkpoints +- **Circle exploit fix:** Short-lap terminates episode immediately +- **Peak segment:** Seg 10 (step 60,000) — 567 reward / 2000 steps (FULL EVAL on mountain_track!) +- **Policy diverged:** Seg 11-15 (31, 20 reward) — best_model.zip captured the peak correctly +- **Checkpoints saved:** checkpoint_0006000.zip through checkpoint_0090000.zip + best_model.zip +- **Final eval results using best_model.zip (step 60k weights):** + +| Track | Ep1 | Ep2 | Ep3 | Mean steps | Result | +|---|---|---|---|---|---| +| mountain_track (training) | 382 | 529 | 182 | 364 | ❌ crashes | +| generated_track (zero-shot) | 63 | 61 | 61 | 62 | ❌ crashes | +| mini_monaco (zero-shot) | 154 | 155 | 104 | 138 | ❌ crashes at one corner | +| generated_road (zero-shot) | 41 | 42 | 41 | 41 | ❌ crashes | + +- **Throttle test:** mini_monaco at throttle_min=0.5 over 5 episodes: 93/94/79/95/94 steps (mean=91, very consistent = same corner every time). throttle_min=0.2 test impossible — action space baked in at training time. +- **Key findings:** + 1. ✅ Circle exploit fully eliminated — no short laps observed + 2. ✅ Best model saving worked — captured step 60k peak, not step 90k drift + 3. ✅ Genuine 20-22 second laps during training from step ~18k onward + 4. ❌ Model crashes at exactly the same corner on mini_monaco every time (too fast) + 5. ❌ throttle_min=0.5 baked into action space — model cannot output throttle < 0.5, cannot slow for corners + 6. 🔑 INSIGHT: v4 + 0.2 failed because v4 gradient = 0 on hill. v5 gradient is non-zero — model CAN learn to apply high throttle when needed even with 0.2 floor + +### Exp 9 — Mountain track, v5 reward, throttle_min=0.2 (RUNNING) +- **Change from Exp8:** throttle_min: 0.5 → **0.2** (only change) +- **Reward:** v5 (speed × CTE-quality) — UNCHANGED +- **Hypothesis:** v5 reward provides non-zero gradient signal on hill (∂reward/∂speed is non-zero). + Model CAN learn to output high throttle on hill. With 0.2 floor, model has full range [0.2, 1.0] + and can apply lower throttle on corners — potentially solving mini_monaco corner crash. +- **What we never tested:** (0.2, v4) failed. (0.5, v5) worked. (0.2, v5) was never tried. +- **Risk:** Model may still stall on hill if gradient convergence is slow in early training. + StuckTermination (-1.0) + v5 speed gradient together should push toward higher throttle. +- **Next test (Exp10):** Add track_progress bonus to reward (v6) — one variable at a time. +- **Save dir:** models/exp9-mountain-v5-throttle02/ +- **Watch:** tail -f /tmp/exp9.log +