donkeycar-rl-autoresearch

Commit Graph

Author	SHA1	Message	Date
Paul Huliganga	c62fba40b2	fix(agent): explicit hit backstop in StuckTermination + eval diagnostics multitrack_runner.py: adds Python-side hit check as a zero-latency backstop — gym_donkeycar can delay hit!=none termination by one frame; this fires on the same step and records stuck_reason for diagnostics. eval_on_track.py: logs hit value and stuck_reason at episode end; calls exit_scene after eval so the sim returns to main menu (next gym.make() can switch scenes); removes unused SPEED_SCALE constant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 15:32:10 -04:00
Paul Huliganga	b8a13dea81	feat: v5 reward — speed × CTE-quality, drop efficiency term Problem with v4 on mountain_track: CTE × efficiency × speed all collapse to zero simultaneously when the car slows on the hill, giving no gradient signal for 'apply more throttle'. v5: reward = (speed / 10) × (1 - \|CTE\| / max_cte) - Directly rewards going fast while staying centred - Hill: car slows → reward drops → clear gradient toward more throttle - Circling protection now entirely handled by lap-time penalty + StuckTerminationWrapper (not by the reward formula) Tests updated to reflect v5 semantics (102 passing). Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-17 13:25:38 -04:00
Paul Huliganga	a3a49fbcaf	feat: eval_on_track.py — proper zero-shot eval on any track The goal is a model that generalises to ANY road-surface track, not specifically mini_monaco. mini_monaco (tight barriers, hairpins) was a bad proxy for this. Generated_road is a much better zero-shot test: same visual category, never seen during Wave 4 training. eval_on_track.py lets us run the Wave 4 champion on any track with the same wrappers used during training, plus shuttle-exploit detection. Run after Trial 25 finishes: python3 agent/eval_on_track.py --model agent/models/wave4-champion/model.zip --track donkey-generated-roads-v0 --episodes 3 --max-steps 3000 Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-16 19:47:56 -04:00

Author

SHA1

Message

Date

Paul Huliganga

c62fba40b2

fix(agent): explicit hit backstop in StuckTermination + eval diagnostics

multitrack_runner.py: adds Python-side hit check as a zero-latency backstop
— gym_donkeycar can delay hit!=none termination by one frame; this fires
on the same step and records stuck_reason for diagnostics.

eval_on_track.py: logs hit value and stuck_reason at episode end; calls
exit_scene after eval so the sim returns to main menu (next gym.make() can
switch scenes); removes unused SPEED_SCALE constant.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-14 15:32:10 -04:00

Paul Huliganga

b8a13dea81

feat: v5 reward — speed × CTE-quality, drop efficiency term

Problem with v4 on mountain_track: CTE × efficiency × speed all collapse
to zero simultaneously when the car slows on the hill, giving no gradient
signal for 'apply more throttle'.

v5: reward = (speed / 10) × (1 - |CTE| / max_cte)
- Directly rewards going fast while staying centred
- Hill: car slows → reward drops → clear gradient toward more throttle
- Circling protection now entirely handled by lap-time penalty +
  StuckTerminationWrapper (not by the reward formula)

Tests updated to reflect v5 semantics (102 passing).

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A

2026-04-17 13:25:38 -04:00

Paul Huliganga

a3a49fbcaf

feat: eval_on_track.py — proper zero-shot eval on any track

The goal is a model that generalises to ANY road-surface track, not
specifically mini_monaco.  mini_monaco (tight barriers, hairpins) was
a bad proxy for this.  Generated_road is a much better zero-shot test:
same visual category, never seen during Wave 4 training.

eval_on_track.py lets us run the Wave 4 champion on any track with
the same wrappers used during training, plus shuttle-exploit detection.

Run after Trial 25 finishes:
  python3 agent/eval_on_track.py     --model agent/models/wave4-champion/model.zip     --track donkey-generated-roads-v0     --episodes 3 --max-steps 3000

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A

2026-04-16 19:47:56 -04:00

3 Commits