wave3: add multi-track autoresearch system (83 tests passing)
New files:
- agent/multitrack_runner.py: trains PPO round-robin across generated_road,
generated_track, mountain_track; zero-shot evaluates on mini_monaco + warren
- agent/wave3_controller.py: GP+UCB outer loop optimising combined test score
- tests/test_wave3.py: 30 new tests (83 total)
Track classification (from visual analysis of all 10 screenshots):
Training : generated_road, generated_track, mountain_track
Test (ZSL): mini_monaco, warren (pseudo-outdoor — proper road markings)
Skip : warehouse, robo_racing_league, waveshare, circuit_launch (indoor floor)
avc_sparkfun (orange markings — different visual domain)
Key design decisions:
ADR-010: Warren = pseudo-outdoor track (proper road lines, not floor marks)
ADR-011: Test tracks NEVER used in training; GP optimises test score only
ADR-012: All trials warm-start from Phase 2 champion model
Switching: env.close() + send_exit_scene_raw() + 4s wait + gym.make()
Pre-Wave-3 baseline: 1/10 tracks drivable (0/2 held-out test tracks)
Wave 3 goal: 2/2 test tracks drivable (mini_monaco + warren)
Agent: pi
Tests: 83 passed
Tests-Added: 30
TypeScript: N/A