RESULTS:
T20 (champion): ✅ Generated Road only (1/10 tracks)
T08: ✅ Generated Road only (1/10 tracks)
T18: ❌ All tracks crash (0/10) — even new Generated Road layout!
Robo Racing League: best unseen result (116 steps) — visual similarity to generated_road?
Thunderhill: not available in this simulator version
KEY FINDING: Models are visually overfit to generated_road CNN features.
All unseen tracks crash within 40-116 steps (vs 2200+ on trained track).
This is the expected Phase 2→3 transition point.
WAVE 3 STRATEGY (documented in RESEARCH_LOG.md):
Stage 1: generated_road ↔ generated_track (same geometry, different visuals)
Stage 2: + mountain_track (different geometry)
Stage 3: all tracks rotation (true generalization)
Also fixed: multitrack_eval.py updated with only valid scene names
(thunderhill removed — not in this simulator version)
Agent: pi/claude-sonnet
Tests: 53/53 passing
TypeScript: N/A