diff --git a/docs/TEST_HISTORY.md b/docs/TEST_HISTORY.md new file mode 100644 index 0000000..cffbbac --- /dev/null +++ b/docs/TEST_HISTORY.md @@ -0,0 +1,156 @@ +# Test History — DonkeyCar RL Autoresearch + +Last updated: 2026-04-18 + +This document records every significant training experiment, what was +changed, what was observed, and what was learned. Use this to make +methodical decisions rather than random changes. + +--- + +## Baseline Models (Phase 1 & 2) + +### Phase 2 Champion +- **Model:** `models/champion/model.zip` +- **Track trained on:** generated_road only +- **Steps:** 13,328 +- **Hyperparams:** lr=0.000225, PPO continuous actions, ThrottleClamp(0.2), v4 reward +- **Result:** ✅ Drives generated_road perfectly, stays in right lane +- **Zero-shot:** ❌ Fails on generated_track (confirmed), ❌ Fails on mini_monaco +- **Notes:** Single track, simple road, model converged cleanly. Final model = best model (no divergence in 13k steps) + +--- + +## Mountain Track Experiments + +All experiments: mountain_track only, lr=0.000725, throttle_min varies, 90k steps + +### Exp 1 — Mountain track, old v4 reward, throttle_min=0.2 +- **Reward:** v4 (CTE × efficiency × speed) +- **throttle_min:** 0.2 +- **Key observation:** Car gets partway up hill, slows, stops, rolls back. Always crashes at same step (~153-166). Steps logged: 0.200 throttle at hill = not enough power +- **Root cause:** v4 reward gives zero gradient signal on hill (efficiency→0, speed→0, reward→0 simultaneously, no direction for "apply more throttle") +- **Learned:** v4 reward is broken for inclined terrain + +### Exp 2 — Mountain track, old v4 reward, throttle_min=0.2, continued to 200k +- **Reward:** v4 +- **throttle_min:** 0.2 +- **Key observation:** Only 2 behaviors: turn left and hit barrier, or go straight and hit barrier at turn +- **Result:** ❌ Killed early — no improvement +- **Learned:** More steps alone cannot fix a broken reward signal + +### Exp 3 — Mountain track, old v4 reward, throttle_min=0.5 +- **Reward:** v4 +- **throttle_min:** 0.5 (increased to overcome hill) +- **Key observation:** Circle exploit dominated entire run — 0.5-1.75 second laps throughout +- **Lap times logged:** All short (exploit) +- **Result:** ❌ Model useless (reward=4.99 after 90k steps) +- **Learned:** Higher throttle got car over hill but circle exploit took over because v4 has no efficiency penalty when throttle is high + +### Exp 4 — Continued from Exp 3 (200k total), old v4 reward, throttle_min=0.5 +- **Reward:** v4 +- **throttle_min:** 0.5 +- **Key observation:** Killed early — same 2 behaviors (left into barrier, straight into barrier) +- **Result:** ❌ Killed +- **Learned:** Continuing bad training does not help + +### Exp 5 — Mountain track, v5 reward, throttle_min=0.5 ⭐ KEY EXPERIMENT +- **Reward:** v5 (speed × CTE-quality) — NEW reward that directly incentivises throttle on hills +- **throttle_min:** 0.5 +- **Method:** Direct model.learn() — NO train_multitrack(), ONE connection throughout +- **Key observation:** Genuine 20-22 second laps appearing from step ~30,000 onward +- **Lap times:** 19-22 seconds (genuine), consistently for 60k steps +- **Result:** ❌ Final model poor — best model was at step ~30k but we only saved final (step 90k) model +- **Root cause of failure:** No best-model saving. Policy peaked at 30k, diverged by 90k +- **Learned:** + 1. v5 reward WORKS for mountain track + 2. throttle_min=0.5 WORKS for hill + 3. Direct model.learn() (no track switching) avoids phantom car issues + 4. MUST save best model during training, not just final + +### Exp 6 — Mountain track, v5 reward, throttle_min=0.5, train_multitrack (1 segment) +- **Reward:** v5 +- **throttle_min:** 0.5 (first segment only — close_and_switch used 0.2 for subsequent segments) +- **Method:** train_multitrack() with steps_per_switch=90000 (one giant segment = one checkpoint) +- **Key observation:** Circle exploit dominated — only 0.5-1.75 second laps throughout +- **Result:** ❌ Only 1 checkpoint saved (at step 90k). Best reward=4.99 +- **Root cause:** Using steps_per_switch=TOTAL_STEPS defeated checkpointing (one segment = one save). Circle exploit reappeared (different from Exp5 — random seed variation) +- **Learned:** steps_per_switch=TOTAL_STEPS is WRONG for single-track training with checkpointing + +### Exp 7 — Mountain track, v5 reward + episode termination on short lap, throttle_min mixed +- **Reward:** v5 + short-lap now TERMINATES episode (not just penalty) +- **throttle_min:** 0.5 initial, 0.2 after segment 1 (bug: close_and_switch used module default) +- **Method:** train_multitrack() with steps_per_switch=6000 (15 segments) +- **Key observation:** Car in LEFT lane, sitting doing nothing. Not normal spawn position. +- **Hypothesis:** Phantom car from Exp6's ghost car still in sim. Two TCP connections spawned two cars. User watched phantom (left lane, no commands). Training went to different car. +- **Result:** ❌ Killed — phantom car issue +- **Learned:** + 1. close_and_switch() between segments creates phantom car risk for single-track training + 2. throttle_min MUST be passed consistently — module default is 0.2, not 0.5 + 3. For single-track training: do NOT use close_and_switch() at all + +### Exp 8 — Mountain track, v5 reward + episode termination, throttle_min=0.5 consistently (RUNNING NOW) +- **Reward:** v5 + short-lap terminates episode +- **throttle_min:** 0.5 throughout (no close_and_switch = no module default override) +- **Method:** Direct model.learn() in loop — ONE connection throughout entire run +- **Checkpoints:** 15 numbered saves (every 6,000 steps) + best_model.zip +- **PID:** 2941877, log: /tmp/exp8.log +- **Status:** Running since 11:17, ~1h45m total +- **Watch:** `tail -f /tmp/exp8.log` +- **Success criteria:** Genuine 19-22 second laps appearing during training AND best_model.zip drives cleanly in deterministic eval + +--- + +## Wave 4 Multi-Track Experiments (generated_track + mountain_track) + +### Trial 9 ⭐ BEST OVERALL MODEL +- **Model:** `models/wave4-trial-0009/model.zip` +- **Tracks:** generated_track + mountain_track (round-robin, switch every 6,851 steps) +- **Steps:** 89,893 total (~45k per track) +- **Hyperparams:** lr=0.000725, switch=6,851 +- **Reward:** v4 (old — before exploit patches) +- **Result:** + - ✅ Drives generated_track (3/3 episodes, 13-16 second genuine laps) + - ✅ Drives mini_monaco zero-shot (2000 steps, 40-second genuine laps — never seen in training) + - ❌ Crashes on mountain_track (~200 steps — hill + corner) + - ❌ Crashes on generated_road (~46 steps — turns right immediately) +- **Notes:** Only 1 of 25 Wave 4 trials succeeded. Suspected random seed luck. Same hyperparameters repeated in Exp2 (overnight) produced useless model. + +### Wave 4 Other Trials (1-25 except Trial 9) +- **Result:** All crashed on mini_monaco within 20-265 steps +- **Median mini_monaco score:** ~112 (crashes at ~130 steps) +- **Trials 14, 25:** Scored 1573, 1543 — suspected shuttle exploit (car going back and forth on straight) +- **Learned:** Multi-track training is highly sensitive to random seed. GP+UCB did not converge reliably. + +--- + +## Key Decisions Made (What We Keep) + +| Decision | Reason | +|---|---| +| v5 reward: `speed × CTE-quality` | Directly incentivises throttle on hills. v4 gave zero gradient on inclines. | +| throttle_min=0.5 for mountain_track | Overcomes hill. Car can now reach first corner. | +| Short-lap penalty + EPISODE TERMINATION | Penalty alone insufficient — model stayed alive and accumulated rewards between laps. Termination makes circling strictly unprofitable. | +| Numbered checkpoints every segment | Never lose a good mid-training model again (ADR-017) | +| best_model.zip updated on new best segment score | Final model ≠ best model. Peak can be at 30k even if final is at 90k. | +| Single TCP connection for single-track training | Avoids phantom car problem from close_and_switch() | +| lr=0.000725 | From Trial 9 (best model). Consistent with good results. | + +## Key Problems Still Open + +| Problem | Status | +|---|---| +| Mountain track circle exploit | Partially fixed — episode termination added. Exp8 will show if it holds. | +| Mountain track — car can't navigate first corner reliably | Still being investigated. Exp5 showed genuine laps so it IS solvable. | +| Multi-track generalization is random-seed dependent | No reliable solution yet. Trial 9 was lucky. | +| Mountain track model doesn't generalise to other tracks | Expected — single track training generalises poorly. Next step after Exp8 succeeds. | + +--- + +## Next Steps (Proposed, Not Yet Run) + +1. **Exp 8 result:** If best_model.zip drives mountain_track reliably → proceed to Step 2 +2. **Combine mountain_track + generated_track** using v5 reward, throttle_min=0.5, proper checkpointing +3. **Test combined model** on all 4 tracks — can it generalise to mini_monaco like Trial 9 did? +4. **If yes:** We have reproduced Trial 9 reliably with a better reward function +