Commit Graph

1 Commits

Author SHA1 Message Date
Paul Huliganga 5a626c87be feat: comprehensive multi-track evaluation script + research log updates
- multitrack_eval.py: tests all 3 top models against all 11 DonkeyCar tracks
  - Automatic track switching via exit_scene → reconnect
  - 11 tracks: generated_road, generated_track, mountain, warehouse, AVC,
    mini_monaco, warren, robo_racing, waveshare, thunderhill, circuit_launch
  - Records: reward, steps, oscillation, CTE distribution, drove_far flag
  - Saves to outerloop-results/multitrack_results.jsonl
  - Prints comparison table at the end
- RESEARCH_LOG.md: exit_scene fix documented, Phase 3 begun
- IMPLEMENTATION_PLAN.md: Wave 3 streams defined

Agent: pi/claude-sonnet
Tests: 53/53 passing
Tests-Added: 0
TypeScript: N/A
2026-04-14 10:11:47 -04:00