multitrack_runner.py: adds Python-side hit check as a zero-latency backstop
— gym_donkeycar can delay hit!=none termination by one frame; this fires
on the same step and records stuck_reason for diagnostics.
eval_on_track.py: logs hit value and stuck_reason at episode end; calls
exit_scene after eval so the sim returns to main menu (next gym.make() can
switch scenes); removes unused SPEED_SCALE constant.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Problem with v4 on mountain_track: CTE × efficiency × speed all collapse
to zero simultaneously when the car slows on the hill, giving no gradient
signal for 'apply more throttle'.
v5: reward = (speed / 10) × (1 - |CTE| / max_cte)
- Directly rewards going fast while staying centred
- Hill: car slows → reward drops → clear gradient toward more throttle
- Circling protection now entirely handled by lap-time penalty +
StuckTerminationWrapper (not by the reward formula)
Tests updated to reflect v5 semantics (102 passing).
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
The goal is a model that generalises to ANY road-surface track, not
specifically mini_monaco. mini_monaco (tight barriers, hairpins) was
a bad proxy for this. Generated_road is a much better zero-shot test:
same visual category, never seen during Wave 4 training.
eval_on_track.py lets us run the Wave 4 champion on any track with
the same wrappers used during training, plus shuttle-exploit detection.
Run after Trial 25 finishes:
python3 agent/eval_on_track.py --model agent/models/wave4-champion/model.zip --track donkey-generated-roads-v0 --episodes 3 --max-steps 3000
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A