donkeycar-rl-autoresearch/agent/experiments
Paul Huliganga 2d52bb4ffc fix(core): replace exploit bandaids with solid physics barriers + clean reward
Root cause: barriers were zero-thickness MeshCollider planes with no CCD on the
car. The car tunnelled through between frames. Every Python patch was trying to
catch in code what physics should enforce.

Unity (source only — build in progress):
- RoadBuilder.cs: CreateBarrier() now makes BoxCollider-per-segment with real 3D
  volume (barrierThickness=1.0m default) + half-thickness overlap at corners to
  seal gaps. CreateEndCap() seals open ends of non-looping tracks (generated_road).
- Car.cs: rb.collisionDetectionMode = Continuous in Awake() — prevents tunneling.

Python:
- reward_wrapper.py v7: removed CTE-patience termination, high-CTE negative
  reward, solid_hit monitoring, low-speed/wedge detection. Kept: efficiency gate,
  no-progress (active_node) termination, lap exploit guard. Reward = speed×CTE_quality.
- exp23_generated_road_clean.py: single track, no warm-start, 200k steps, clean
  reward, MAX_EPISODE_SECONDS=120 as safety net only.
- tests: 17 tests covering clean reward properties.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 15:56:00 -04:00
..
README.md fix: exp18 — fix circular exploit in parallel training (window=200, min_lap=12s) 2026-04-28 09:00:42 -04:00
exp6_mountain_v5_proper.py save: all experiment scripts moved from /tmp to agent/experiments/ 2026-04-18 21:30:08 -04:00
exp7_mountain_proper.py save: all experiment scripts moved from /tmp to agent/experiments/ 2026-04-18 21:30:08 -04:00
exp8_mountain_clean.py save: all experiment scripts moved from /tmp to agent/experiments/ 2026-04-18 21:30:08 -04:00
exp9_mountain_v5_throttle02.py save: all experiment scripts moved from /tmp to agent/experiments/ 2026-04-18 21:30:08 -04:00
exp10_two_tracks.py save: all experiment scripts moved from /tmp to agent/experiments/ 2026-04-18 21:30:08 -04:00
exp11_parallel_envs.py fix: reward v6 — efficiency gate prevents circular driving, stuck_steps 80→40 2026-04-19 12:02:55 -04:00
exp11b_parallel_v6.py feat: Exp 11b — parallel DummyVecEnv + v6 reward (anti-circle gate) + built-in eval 2026-04-19 12:03:46 -04:00
exp11c_parallel_v6_250k.py feat: Exp 11c — parallel DummyVecEnv + v6 reward, extended to 250k steps 2026-04-19 13:27:38 -04:00
exp11d_parallel_v61.py fix: exp11d remove progress_patience — grass fix only per ADR-020 2026-04-19 16:18:17 -04:00
exp12_mountain_single.py fix: reward v6.1 — active_node progress terminator kills circle/stuck exploits 2026-04-19 17:01:41 -04:00
exp13_gentrack_v4.py feat: Exp 13 — generated_track, v4 reward, back to basics (no extra heuristics) 2026-04-19 17:33:17 -04:00
exp14_finetune_v5.py fix: import json, use make_env_base in phase switch, and run eval sequentially to avoid second concurrent sim car 2026-04-19 20:37:25 -04:00
exp14_mountain_v5.py fix: exp14 — proper track switch via exit_scene before connecting to mountain_track 2026-04-19 19:18:33 -04:00
exp15_gentrack_from_mountain.py fix: force scene reset before exp15 generated-track warm-start so sim actually loads generated_track 2026-04-20 16:36:00 -04:00
exp16_mountain_from_gentrack.py feat: add cross-track warm-start experiments for mountain->generated and generated->mountain 2026-04-20 16:34:24 -04:00
exp17_parallel_450k.py feat: add exp17 parallel DummyVecEnv 450k training + strategy docs 2026-04-28 02:42:20 -04:00
exp18_parallel_450k_v2.py fix: exp18 — fix circular exploit in parallel training (window=200, min_lap=12s) 2026-04-28 09:00:42 -04:00
exp19_parallel_450k_v3.py fix: exp19 — hard episode time limit to stop minutes-long stuck cars 2026-04-28 09:18:04 -04:00
exp20_parallel_450k_v5.py feat(exp22): add solid-hit/wedge/high-CTE exploit fixes and generated-pair warm experiments 2026-05-05 14:46:13 -04:00
exp21_generated_pair_warm_v4.py feat(exp22): add solid-hit/wedge/high-CTE exploit fixes and generated-pair warm experiments 2026-05-05 14:46:13 -04:00
exp22_generated_pair_warm_v6.py feat(exp22): add solid-hit/wedge/high-CTE exploit fixes and generated-pair warm experiments 2026-05-05 14:46:13 -04:00
exp23_generated_road_clean.py fix(core): replace exploit bandaids with solid physics barriers + clean reward 2026-05-05 15:56:00 -04:00
mountain_continue.py save: all experiment scripts moved from /tmp to agent/experiments/ 2026-04-18 21:30:08 -04:00
mountain_high_throttle.py save: all experiment scripts moved from /tmp to agent/experiments/ 2026-04-18 21:30:08 -04:00
mountain_v5.py save: all experiment scripts moved from /tmp to agent/experiments/ 2026-04-18 21:30:08 -04:00
overnight.py save: all experiment scripts moved from /tmp to agent/experiments/ 2026-04-18 21:30:08 -04:00
wave5_train.py save: all experiment scripts moved from /tmp to agent/experiments/ 2026-04-18 21:30:08 -04:00

README.md

Experiment Scripts

These scripts were used to run individual training experiments. Each corresponds to an entry in docs/TEST_HISTORY.md.

Script Experiment Key change
exp18_parallel_450k_v2.py Exp 18 Exp 17 + exploit fix: window=200, min_lap_time=12s
exp17_parallel_450k.py Exp 17 Parallel DummyVecEnv, 450k steps, v6 reward — circular exploit dominated
mountain_v5.py Exp 5 v5 reward + throttle_min=0.5, direct model.learn()
mountain_continue.py Exp 4 Continued Exp3 training
mountain_high_throttle.py Exp 3 throttle_min=0.5, old v4 reward
exp6_mountain_v5_proper.py Exp 6 v5 + termination, wrong steps_per_switch (=total)
exp7_mountain_proper.py Exp 7 v5 + termination, correct steps_per_switch=6000, had phantom car issue
exp8_mountain_clean.py Exp 8 v5 + throttle_min=0.5, single connection, correct checkpointing
exp9_mountain_v5_throttle02.py Exp 9 v5 + throttle_min=0.2, OUR BEST MODEL
exp10_two_tracks.py Exp 10 Two tracks via custom script (abandoned — used multitrack_runner.py instead)
overnight.py Overnight runs mountain-only and Trial9-repeat experiments
wave5_train.py Wave 5 generated_track only with throttle_min=0.2

Rule going forward

ALL experiment scripts must be saved here and committed to git BEFORE running. Scripts in /tmp are lost on reboot.

Running experiments

Use multitrack_runner.py directly for two-track training: python3 multitrack_runner.py --total-timesteps 90000 --steps-per-switch 6000 ...

For single-track experiments, use the pattern from exp8/exp9:

  • VecTransposeImage(DummyVecEnv([make_env])) for env creation
  • Direct model.learn() loop with manual checkpointing
  • No close_and_switch() for single track