The goal is a model that generalises to ANY road-surface track, not
specifically mini_monaco. mini_monaco (tight barriers, hairpins) was
a bad proxy for this. Generated_road is a much better zero-shot test:
same visual category, never seen during Wave 4 training.
eval_on_track.py lets us run the Wave 4 champion on any track with
the same wrappers used during training, plus shuttle-exploit detection.
Run after Trial 25 finishes:
python3 agent/eval_on_track.py --model agent/models/wave4-champion/model.zip --track donkey-generated-roads-v0 --episodes 3 --max-steps 3000
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A