Strategy change driven by Trial 1 data analysis:
- generated_road removed: too similar to generated_track, and Phase-2
warm-start caused catastrophic forgetting (reward 2388→37 in one rotation)
- mountain_track mean reward was only 17 — model never converged there
- mini_monaco score 24.9 (37 steps) — model was outputting degenerate actions
Wave 4 approach:
- NO warm-start: fresh random weights every trial
- Train: generated_track + mountain_track (visually distinct backgrounds,
both have road markings — forces model to learn general mark-following)
- Test (zero-shot): mini_monaco only (never seen during training)
- Wider LR search: [1e-4, 2e-3] (scratch model needs different range)
- Larger step budgets: 60k-250k total (fresh model needs more time)
- Seed params: lr=0.0003 and lr=0.001 (diverse from the start)
Files:
- multitrack_runner.py: 2 training tracks, no warm-start auto-detection
- wave4_controller.py: new Wave 4 GP+UCB controller
- tests updated: TRAINING_TRACKS assertion, seed param tests → wave4
- 96 tests passing
ADR-013 to follow.
Agent: pi
Tests: 96 passed
Tests-Added: 0
TypeScript: N/A