feat: add exp17 parallel DummyVecEnv 450k training + strategy docs

- exp17_parallel_450k.py: parallel two-track training (generated_track:9091,
  mountain_track:9093), 450k steps, v6 reward, HOST=localhost
- DECISIONS.md: ADR-025 (parallel strategy) and ADR-026 (mountain friction fix)
- docs/STATE.md: updated to April 2026 state with current champions and strategy
- docs/TEST_HISTORY.md: mountain friction fix notes + Exp 17 full design
- outerloop-results: exp14 finetune logs and robust mountain eval results

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Paul Huliganga 2026-04-28 02:42:20 -04:00
parent 6e2427571a
commit b504b89b2a
8 changed files with 538 additions and 83 deletions

View File

@ -576,3 +576,79 @@ experts, not as obviously reusable initializations for the other track.
- If transfer is revisited, it likely needs a more careful method than naive direct - If transfer is revisited, it likely needs a more careful method than naive direct
warm-starting on the other track warm-starting on the other track
- Mountain physics issues should be addressed before revisiting transfer conclusions - Mountain physics issues should be addressed before revisiting transfer conclusions
---
## ADR-025: Parallel DummyVecEnv with 400k+ Steps is the Primary Multi-Track Strategy
**Date:** 2026-04-27
**Status:** Active
**Context:** After Wave 4 (25 trials, 80% failure rate), Exp 10 (catastrophic forgetting),
Exp 11b (infrastructure works but 90k steps insufficient), and Exp 15/16 (cross-track
warm starts failed both directions), the only multi-track approach that did not have a
fundamental flaw was parallel DummyVecEnv — Exp 11b failed only because the training
budget was halved relative to what single-track training needs.
**Decision:** The primary next strategy is:
1. Two sim instances (one per training track, separate ports)
2. SB3 `DummyVecEnv([env_generated, env_mountain])` — PPO sees both tracks in every batch
3. 400,000500,000 total timesteps (~200k effective per track)
4. v6 reward (efficiency gate + CTE patience terminator) on both envs
5. No warm start — train from random weights
6. Checkpoint every 20k steps, track mini_monaco zero-shot score throughout
**Why parallel DummyVecEnv:**
- PPO is an on-policy algorithm that depends on a stable rollout buffer.
Swapping environments mid-training disrupts value estimates and causes catastrophic forgetting.
DummyVecEnv feeds both tracks into every PPO rollout batch — no forgetting, no disruption.
- This is how SB3 was designed to be used with multiple environments.
**Why 400k+ steps:**
- Single-track training converges in ~6090k steps.
- Two parallel tracks need at least 2× the budget because each track gets half the gradient.
Interference between the two tasks adds further overhead.
- Exp 11b at 90k steps (effectively 45k per track) produced only 194-step drives on both tracks.
400k should provide adequate budget for both.
**Rejected alternatives:**
- Round-robin close-and-switch: disrupts PPO, 80% failure rate across 25 trials
- Cross-track warm starts: failed both directions (ADR-024)
- More autoresearch trials on round-robin: the method is fundamentally unreliable
**Fallback if 400k parallel fails:** Curriculum — train generated_track alone for 150k steps,
then add mountain to the DummyVecEnv pool for 250k more steps.
---
## ADR-026: Mountain Track Friction Fix — Use Road Material on Hill Colliders
**Date:** 2026-04-27
**Status:** Accepted — fix applied
**Context:** `WheelPhys.cs` multiplies wheel grip stiffness by the static friction of the
surface the wheel is hitting. The mountain_track scene assigned Slippery physics material
(staticFriction=0.1) to 4 track surface colliders from the long_road prefab, giving the
car 1/5 the normal traction on the hill. This caused visible wheelspin at full throttle and
made hill climbing genuinely difficult for learned policies.
**Decision:** Replace the 4 Slippery material assignments in `mountain_track.unity` with the
Road material (staticFriction=0.5). This is a targeted scene-level override; the Slippery
material asset itself is unchanged and remains available for intentionally slippery surfaces.
**Fix location:** `sdsim/Assets/Scenes/mountain_track.unity` — all 4 PrefabModification
entries that set `propertyPath: m_Material` on long_road colliders now reference Road
(GUID 7884193b0ead347a38a13a67f294dfb5) instead of Slippery (GUID c0e12c099c364af4e9e311a43d0f12c4).
**To activate:** Rebuild the Unity simulator binary after pulling the updated scene file.
No Python code changes needed.
**What this does NOT change:**
- `Slippery.physicMaterial` asset — unchanged (still used by thunderhill, circuit_launch)
- `Donkey_new_phys.prefab` strut colliders — also reference Slippery, but these are car body
parts that the wheels don't touch. WheelPhys.cs only reads friction from ground hits.
- mini_monaco.unity — also has one Slippery reference; left intentional for now
**Expected effect:** Hill wheelspin should stop. The policy should find it easier to climb
the hill at throttle_min=0.2, and Exp 17 multi-track results should be more interpretable
since we are no longer fighting a physics artifact.

View File

@ -5,6 +5,7 @@ Each corresponds to an entry in docs/TEST_HISTORY.md.
| Script | Experiment | Key change | | Script | Experiment | Key change |
|---|---|---| |---|---|---|
| exp17_parallel_450k.py | Exp 17 | Parallel DummyVecEnv, 450k steps, v6 reward, HOST=localhost |
| mountain_v5.py | Exp 5 | v5 reward + throttle_min=0.5, direct model.learn() | | mountain_v5.py | Exp 5 | v5 reward + throttle_min=0.5, direct model.learn() |
| mountain_continue.py | Exp 4 | Continued Exp3 training | | mountain_continue.py | Exp 4 | Continued Exp3 training |
| mountain_high_throttle.py | Exp 3 | throttle_min=0.5, old v4 reward | | mountain_high_throttle.py | Exp 3 | throttle_min=0.5, old v4 reward |

View File

@ -0,0 +1,199 @@
"""
Exp 17: Parallel DummyVecEnv generated_track + mountain_track, 450k steps.
Strategy: Exp 11b proved the parallel DummyVecEnv infrastructure is stable.
The only failure mode was insufficient training budget (~45k effective steps
per track). This experiment triples the budget to ~225k per track.
Changes from Exp 11b:
- HOST: 10.0.0.55 localhost (WSL/Windows share ports)
- TOTAL_STEPS: 90k 450k
- CHECKPOINT_EVERY: 6k 20k
- SAVE_DIR: exp17-parallel-450k
Everything else identical to Exp 11b (same reward, wrappers, lr, throttle_min).
Setup TWO sim instances required:
Sim 1: launch donkey_sim.exe, select generated_track, port 9091 (default)
Sim 2: launch a second donkey_sim.exe with --port 9093, select mountain_track
Command: donkey_sim.exe --port 9093
Both sims must be running and on the correct tracks before starting this script.
Evaluation:
- Mid-training: both training tracks evaluated at each 20k checkpoint
- End-of-training: all 4 tracks evaluated sequentially (port 9091)
"""
import sys, os, time
sys.path.insert(0, '/home/paulh/projects/donkeycar-rl-autoresearch/agent')
from multitrack_runner import log, StuckTerminationWrapper
from donkeycar_sb3_runner import ThrottleClampWrapper
from reward_wrapper import SpeedRewardWrapper
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage
import gymnasium as gym
import numpy as np
HOST = 'localhost'
THROTTLE_MIN = 0.2
LR = 0.000725
TOTAL_STEPS = 450_000
CHECKPOINT_EVERY = 20_000
SAVE_DIR = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp17-parallel-450k'
os.makedirs(SAVE_DIR, exist_ok=True)
def make_env(track_id, port):
def _init():
raw = gym.make(track_id, conf={'host': HOST, 'port': port})
env = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
env = StuckTerminationWrapper(env, stuck_steps=40, min_displacement=0.5)
env = SpeedRewardWrapper(env)
return env
return _init
log('=' * 60)
log('Exp 17: Parallel DummyVecEnv — 450k steps')
log(f' Sim 1: {HOST}:9091 → generated_track')
log(f' Sim 2: {HOST}:9093 → mountain_track')
log(f' throttle_min={THROTTLE_MIN}, lr={LR}, total={TOTAL_STEPS:,}')
log(f' Reward: v6 (speed × CTE_quality, efficiency gate >= 0.15)')
log(f' Stuck termination: 40 steps (~2.5s)')
log(f' Checkpoints: every {CHECKPOINT_EVERY:,} steps')
log('=' * 60)
log('Creating DummyVecEnv with two tracks...')
env = DummyVecEnv([
make_env('donkey-generated-track-v0', 9091),
make_env('donkey-mountain-track-v0', 9093),
])
env = VecTransposeImage(env)
log(f' VecEnv num_envs={env.num_envs}, obs={env.observation_space.shape}')
model = PPO('CnnPolicy', env, learning_rate=LR, verbose=1, device='cpu')
log('PPO created. Starting training...')
best_reward = float('-inf')
steps_done = 0
while steps_done < TOTAL_STEPS:
seg_steps = min(CHECKPOINT_EVERY, TOTAL_STEPS - steps_done)
model.learn(total_timesteps=seg_steps, reset_num_timesteps=False)
steps_done += seg_steps
ckpt = os.path.join(SAVE_DIR, f'checkpoint_{steps_done:07d}')
model.save(ckpt)
model.save(os.path.join(SAVE_DIR, 'model'))
log(f'[{steps_done:,}/{TOTAL_STEPS:,}] Checkpoint saved: {ckpt}.zip')
# Eval on both training tracks using the existing DummyVecEnv connections
try:
obs = env.reset()
ep_rewards = np.zeros(env.num_envs)
ep_steps = np.zeros(env.num_envs)
done_mask = np.zeros(env.num_envs, dtype=bool)
for _ in range(2000):
action, _ = model.predict(obs, deterministic=True)
obs, rewards, dones, infos = env.step(action)
for i in range(env.num_envs):
if not done_mask[i]:
ep_rewards[i] += rewards[i]
ep_steps[i] += 1
if dones[i]:
done_mask[i] = True
if done_mask.all():
break
status0 = '' if ep_steps[0] >= 2000 else f'❌@{int(ep_steps[0])}'
status1 = '' if ep_steps[1] >= 2000 else f'❌@{int(ep_steps[1])}'
log(f' Eval: gen_track={ep_rewards[0]:.1f}r/{int(ep_steps[0])}s {status0} '
f'mountain={ep_rewards[1]:.1f}r/{int(ep_steps[1])}s {status1}')
total_reward = ep_rewards.sum()
if total_reward > best_reward:
best_reward = total_reward
model.save(os.path.join(SAVE_DIR, 'best_model'))
log(f' ⭐ NEW BEST: {best_reward:.1f} combined reward')
except Exception as e:
log(f' Eval error: {e}')
import traceback; traceback.print_exc()
model.save(os.path.join(SAVE_DIR, 'model'))
log(f'\nTraining complete. Best combined reward: {best_reward:.1f}')
env.close()
time.sleep(5)
# --- Final eval on all 4 tracks (sequential, port 9091) ---
log('\n' + '=' * 60)
log('FINAL EVALUATION: best_model on 4 tracks (3 sets each)')
log('=' * 60)
EVAL_TRACKS = [
('donkey-generated-track-v0', 'generated_track'),
('donkey-mountain-track-v0', 'mountain_track'),
('donkey-minimonaco-track-v0', 'mini_monaco'),
('donkey-generated-roads-v0', 'generated_road'),
]
EVAL_PORT = 9091
EVAL_SETS = 3
EVAL_MAX_STEPS = 2000
best_model_path = os.path.join(SAVE_DIR, 'best_model.zip')
results_by_track = {}
for track_id, track_name in EVAL_TRACKS:
log(f'\n--- {track_name} ---')
steps_list = []
for s in range(1, EVAL_SETS + 1):
try:
raw = gym.make(track_id, conf={'host': HOST, 'port': EVAL_PORT})
inner = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
inner = StuckTerminationWrapper(inner, stuck_steps=40, min_displacement=0.5)
inner = SpeedRewardWrapper(inner)
eval_env = VecTransposeImage(DummyVecEnv([lambda e=inner: e]))
eval_model = PPO.load(best_model_path, env=eval_env, device='cpu')
obs = eval_env.reset()
total_r, total_s, done = 0.0, 0, False
while not done and total_s < EVAL_MAX_STEPS:
action, _ = eval_model.predict(obs, deterministic=True)
result = eval_env.step(action)
if len(result) == 4:
obs, r, d, info = result
done = bool(d[0])
else:
obs, r, t, tr, info = result
done = bool(t[0] or tr[0])
total_r += float(r[0])
total_s += 1
status = '' if total_s >= EVAL_MAX_STEPS else f'❌@{total_s}'
log(f' Set {s}: {total_r:.1f}r / {total_s}s {status}')
steps_list.append(total_s)
eval_env.close()
time.sleep(3)
except Exception as e:
log(f' Set {s}: ERROR — {e}')
steps_list.append(0)
time.sleep(3)
mean_steps = np.mean(steps_list) if steps_list else 0
results_by_track[track_name] = steps_list
log(f' Mean: {mean_steps:.0f} steps')
log('\n' + '=' * 60)
log('SUMMARY')
log('=' * 60)
for track_name, steps_list in results_by_track.items():
steps_str = '/'.join(str(s) for s in steps_list)
mean = np.mean(steps_list)
verdict = '' if mean >= 1500 else '⚠️' if mean >= 500 else ''
log(f' {verdict} {track_name:20s}: {steps_str} mean={mean:.0f}')
log(f'\n=== Exp 17 COMPLETE ===')

View File

@ -0,0 +1,61 @@
2026-04-20T00:08:21.090963 Loading warm-start model from models/exp14-mountain-v5/best_model.zip
2026-04-20T00:09:16.674927 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using throttle_min=0.2 env
2026-04-20T00:09:19.055092 Switching model to env with throttle_min=0.4
2026-04-20T00:10:27.385278 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env
2026-04-20T00:11:08.699368 ERROR during fine-tune: 'NoneType' object is not callable
2026-04-20T00:11:08.901669 Fine-tune complete. steps_done=0
2026-04-20T00:14:43.472139 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env
2026-04-20T00:17:44.473941 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env
2026-04-20T00:21:10.924456 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env
2026-04-20T00:25:31.932947 Loading warm-start model from models/exp14-mountain-v5/best_model.zip without creating a temp env
2026-04-20T00:28:59.848890 [6000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0006000.zip
2026-04-20T00:28:59.848966 ERROR during fine-tune: name 'make_env' is not defined
2026-04-20T00:29:00.509181 Fine-tune complete. steps_done=6000
2026-04-20T00:31:09.594830 Loading warm-start model from models/exp14-mountain-v5/best_model.zip without creating a temp env
2026-04-20T00:34:50.056288 [6000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0006000.zip
2026-04-20T00:35:04.415348 ERROR during fine-tune: name 'json' is not defined
2026-04-20T00:35:04.546033 Fine-tune complete. steps_done=6000
2026-04-20T00:37:47.831240 Loading warm-start model from models/exp14-mountain-v5/best_model.zip without creating a temp env
2026-04-20T00:41:21.675776 [6000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0006000.zip
2026-04-20T00:41:43.554021 Eval @ 6000: mean_steps=384.7 mean_lap=21.59375
2026-04-20T00:41:43.694831 ⭐ NEW BEST (mean lap 21.59s) saved
2026-04-20T00:45:26.980198 [12000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0012000.zip
2026-04-20T00:45:42.741989 Eval @ 12000: mean_steps=187.7 mean_lap=None
2026-04-20T00:49:24.586893 [18000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0018000.zip
2026-04-20T00:49:42.795830 Eval @ 18000: mean_steps=287.3 mean_lap=None
2026-04-20T00:53:15.614884 [24000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0024000.zip
2026-04-20T00:53:37.070339 Eval @ 24000: mean_steps=374.7 mean_lap=21.765625
2026-04-20T00:57:09.352148 [30000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0030000.zip
2026-04-20T00:57:36.938090 Eval @ 30000: mean_steps=537.7 mean_lap=22.046875
2026-04-20T00:57:36.938120 Switching env to throttle_min=0.2
2026-04-20T01:00:55.914640 [36000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0036000.zip
2026-04-20T01:01:56.665949 Eval @ 36000: mean_steps=1451.7 mean_lap=28.434895833333332
2026-04-20T01:05:10.807288 [42000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0042000.zip
2026-04-20T01:05:57.449632 Eval @ 42000: mean_steps=1067.7 mean_lap=27.44140625
2026-04-20T01:08:54.843851 [48000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0048000.zip
2026-04-20T01:10:00.878424 Eval @ 48000: mean_steps=1626.7 mean_lap=29.776785714285715
2026-04-20T01:13:16.089861 [54000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0054000.zip
2026-04-20T01:14:18.435622 Eval @ 54000: mean_steps=1528.3 mean_lap=30.234375
2026-04-20T01:17:25.682859 [60000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0060000.zip
2026-04-20T01:18:28.243356 Eval @ 60000: mean_steps=1533.0 mean_lap=34.33125
2026-04-20T01:21:38.247436 [66000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0066000.zip
2026-04-20T01:21:54.995379 Eval @ 66000: mean_steps=163.7 mean_lap=None
2026-04-20T01:25:14.752223 [72000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0072000.zip
2026-04-20T01:26:11.926001 Eval @ 72000: mean_steps=1389.7 mean_lap=43.21484375
2026-04-20T01:29:24.138321 [78000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0078000.zip
2026-04-20T01:29:59.928582 Eval @ 78000: mean_steps=757.0 mean_lap=43.453125
2026-04-20T01:33:15.187091 [84000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0084000.zip
2026-04-20T01:33:49.188449 Eval @ 84000: mean_steps=704.7 mean_lap=41.046875
2026-04-20T01:36:57.554346 [90000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0090000.zip
2026-04-20T01:38:12.054640 Eval @ 90000: mean_steps=1819.0 mean_lap=None
2026-04-20T01:41:29.620560 [96000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0096000.zip
2026-04-20T01:42:07.583154 Eval @ 96000: mean_steps=813.0 mean_lap=None
2026-04-20T01:45:23.503967 [102000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0102000.zip
2026-04-20T01:45:59.052782 Eval @ 102000: mean_steps=747.3 mean_lap=None
2026-04-20T01:49:02.510514 [108000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0108000.zip
2026-04-20T01:49:27.462705 Eval @ 108000: mean_steps=466.0 mean_lap=None
2026-04-20T01:52:40.338223 [114000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0114000.zip
2026-04-20T01:53:31.593848 Eval @ 114000: mean_steps=1169.0 mean_lap=None
2026-04-20T01:56:39.035861 [120000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0120000.zip
2026-04-20T01:57:28.658996 Eval @ 120000: mean_steps=1125.0 mean_lap=None
2026-04-20T01:57:28.795051 Fine-tune complete. steps_done=120000

View File

@ -0,0 +1,20 @@
{"steps_done": 6000, "throttle_min": 0.4, "mean_steps": 384.6666666666667, "mean_lap_time": 21.59375, "per_set": [{"steps": 205, "laps": 0, "lap_times": []}, {"steps": 177, "laps": 0, "lap_times": []}, {"steps": 772, "laps": 1, "lap_times": [21.59375]}]}
{"steps_done": 12000, "throttle_min": 0.4, "mean_steps": 187.66666666666666, "mean_lap_time": null, "per_set": [{"steps": 145, "laps": 0, "lap_times": []}, {"steps": 345, "laps": 0, "lap_times": []}, {"steps": 73, "laps": 0, "lap_times": []}]}
{"steps_done": 18000, "throttle_min": 0.4, "mean_steps": 287.3333333333333, "mean_lap_time": null, "per_set": [{"steps": 233, "laps": 0, "lap_times": []}, {"steps": 244, "laps": 0, "lap_times": []}, {"steps": 385, "laps": 0, "lap_times": []}]}
{"steps_done": 24000, "throttle_min": 0.4, "mean_steps": 374.6666666666667, "mean_lap_time": 21.765625, "per_set": [{"steps": 178, "laps": 0, "lap_times": []}, {"steps": 359, "laps": 0, "lap_times": []}, {"steps": 587, "laps": 1, "lap_times": [21.765625]}]}
{"steps_done": 30000, "throttle_min": 0.4, "mean_steps": 537.6666666666666, "mean_lap_time": 22.046875, "per_set": [{"steps": 854, "laps": 1, "lap_times": [22.046875]}, {"steps": 365, "laps": 0, "lap_times": []}, {"steps": 394, "laps": 0, "lap_times": []}]}
{"steps_done": 36000, "throttle_min": 0.2, "mean_steps": 1451.6666666666667, "mean_lap_time": 28.434895833333332, "per_set": [{"steps": 1540, "laps": 2, "lap_times": [29.34375, 26.84375]}, {"steps": 2000, "laps": 3, "lap_times": [29.4375, 28.4375, 27.015625]}, {"steps": 815, "laps": 1, "lap_times": [29.53125]}]}
{"steps_done": 42000, "throttle_min": 0.2, "mean_steps": 1067.6666666666667, "mean_lap_time": 27.44140625, "per_set": [{"steps": 467, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 3, "lap_times": [27.046875, 27.703125, 27.125]}, {"steps": 736, "laps": 1, "lap_times": [27.890625]}]}
{"steps_done": 48000, "throttle_min": 0.2, "mean_steps": 1626.6666666666667, "mean_lap_time": 29.776785714285715, "per_set": [{"steps": 2000, "laps": 3, "lap_times": [30.796875, 29.828125, 28.734375]}, {"steps": 880, "laps": 1, "lap_times": [30.65625]}, {"steps": 2000, "laps": 3, "lap_times": [29.703125, 29.203125, 29.515625]}]}
{"steps_done": 54000, "throttle_min": 0.2, "mean_steps": 1528.3333333333333, "mean_lap_time": 30.234375, "per_set": [{"steps": 585, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 3, "lap_times": [32.734375, 29.8125, 30.8125]}, {"steps": 2000, "laps": 3, "lap_times": [31.171875, 29.71875, 27.15625]}]}
{"steps_done": 60000, "throttle_min": 0.2, "mean_steps": 1533.0, "mean_lap_time": 34.33125, "per_set": [{"steps": 2000, "laps": 2, "lap_times": [39.140625, 33.140625]}, {"steps": 599, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 3, "lap_times": [34.21875, 31.953125, 33.203125]}]}
{"steps_done": 66000, "throttle_min": 0.2, "mean_steps": 163.66666666666666, "mean_lap_time": null, "per_set": [{"steps": 154, "laps": 0, "lap_times": []}, {"steps": 146, "laps": 0, "lap_times": []}, {"steps": 191, "laps": 0, "lap_times": []}]}
{"steps_done": 72000, "throttle_min": 0.2, "mean_steps": 1389.6666666666667, "mean_lap_time": 43.21484375, "per_set": [{"steps": 2000, "laps": 2, "lap_times": [50.140625, 35.6875]}, {"steps": 169, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 2, "lap_times": [39.890625, 47.140625]}]}
{"steps_done": 78000, "throttle_min": 0.2, "mean_steps": 757.0, "mean_lap_time": 43.453125, "per_set": [{"steps": 174, "laps": 0, "lap_times": []}, {"steps": 1074, "laps": 1, "lap_times": [46.03125]}, {"steps": 1023, "laps": 1, "lap_times": [40.875]}]}
{"steps_done": 84000, "throttle_min": 0.2, "mean_steps": 704.6666666666666, "mean_lap_time": 41.046875, "per_set": [{"steps": 953, "laps": 1, "lap_times": [40.21875]}, {"steps": 181, "laps": 0, "lap_times": []}, {"steps": 980, "laps": 1, "lap_times": [41.875]}]}
{"steps_done": 90000, "throttle_min": 0.2, "mean_steps": 1819.0, "mean_lap_time": null, "per_set": [{"steps": 2000, "laps": 0, "lap_times": []}, {"steps": 1963, "laps": 0, "lap_times": []}, {"steps": 1494, "laps": 0, "lap_times": []}]}
{"steps_done": 96000, "throttle_min": 0.2, "mean_steps": 813.0, "mean_lap_time": null, "per_set": [{"steps": 1671, "laps": 0, "lap_times": []}, {"steps": 385, "laps": 0, "lap_times": []}, {"steps": 383, "laps": 0, "lap_times": []}]}
{"steps_done": 102000, "throttle_min": 0.2, "mean_steps": 747.3333333333334, "mean_lap_time": null, "per_set": [{"steps": 715, "laps": 0, "lap_times": []}, {"steps": 932, "laps": 0, "lap_times": []}, {"steps": 595, "laps": 0, "lap_times": []}]}
{"steps_done": 108000, "throttle_min": 0.2, "mean_steps": 466.0, "mean_lap_time": null, "per_set": [{"steps": 468, "laps": 0, "lap_times": []}, {"steps": 476, "laps": 0, "lap_times": []}, {"steps": 454, "laps": 0, "lap_times": []}]}
{"steps_done": 114000, "throttle_min": 0.2, "mean_steps": 1169.0, "mean_lap_time": null, "per_set": [{"steps": 1318, "laps": 0, "lap_times": []}, {"steps": 1278, "laps": 0, "lap_times": []}, {"steps": 911, "laps": 0, "lap_times": []}]}
{"steps_done": 120000, "throttle_min": 0.2, "mean_steps": 1125.0, "mean_lap_time": null, "per_set": [{"steps": 941, "laps": 0, "lap_times": []}, {"steps": 1492, "laps": 0, "lap_times": []}, {"steps": 942, "laps": 0, "lap_times": []}]}

View File

@ -0,0 +1,13 @@
{"set": 1, "episode": 1, "steps": 195, "reward": 313.8098858782323, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:55:01.852800"}
{"set": 1, "episode": 2, "steps": 907, "reward": 821.3252189619088, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:55:15.580688"}
{"set": 1, "episode": 3, "steps": 187, "reward": 312.3699834933941, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:55:20.305057"}
{"set_summary": {"set": 1, "mean_steps": 429.6666666666667, "mean_reward": 482.50169611117843}}
{"set": 2, "episode": 1, "steps": 1684, "reward": 2886.7210297683996, "laps": 2, "lap_times": [30.796875, 27.3125], "timestamp": "2026-04-19T23:55:43.831212"}
{"set": 2, "episode": 2, "steps": 1791, "reward": 2724.1041878786637, "laps": 2, "lap_times": [29.234375, 31.578125], "timestamp": "2026-04-19T23:56:08.736059"}
{"set": 2, "episode": 3, "steps": 2000, "reward": 3338.140802157104, "laps": 3, "lap_times": [29.828125, 27.828125, 29.171875], "timestamp": "2026-04-19T23:56:34.963968"}
{"set_summary": {"set": 2, "mean_steps": 1825.0, "mean_reward": 2982.9886732680557}}
{"set": 3, "episode": 1, "steps": 189, "reward": 304.40264326371107, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:56:39.723007"}
{"set": 3, "episode": 2, "steps": 2000, "reward": 3396.2255747133167, "laps": 3, "lap_times": [29.875, 28.75, 27.765625], "timestamp": "2026-04-19T23:57:05.989723"}
{"set": 3, "episode": 3, "steps": 773, "reward": 1300.720640436186, "laps": 1, "lap_times": [31.265625], "timestamp": "2026-04-19T23:57:18.198014"}
{"set_summary": {"set": 3, "mean_steps": 987.3333333333334, "mean_reward": 1667.116286137738}}
{"overall": {"mean_steps_across_sets": 1080.6666666666667, "mean_reward_across_sets": 1710.8688851723239}}

View File

@ -1,100 +1,83 @@
# Project State — April 16, 2026 (post-testing) # Project State — April 27, 2026
## The Goal ## The Goal
Train a DonkeyCar model that generalises to any road-surface track Train a DonkeyCar model that generalises to any road-surface track
(outdoor, asphalt, lane markings) — demonstrated by driving a (outdoor, asphalt, lane markings) — demonstrated by driving a
never-seen track without crashing. never-seen track without crashing.
--- ---
## Confirmed Working Models (tested today, observed by user) ## Current Champion Models
### ✅ Phase 2 Champion — generated_road ### ✅ exp13-gentrack-v4 — generated_track specialist
- **Path:** `models/champion/model.zip` - **Path:** `models/exp13-gentrack-v4/best_model.zip`
- **Trained on:** generated_road only, ~13k steps, lr=0.000225 - **Trained on:** generated_track only, ~30k steps (stopped early), lr=0.000725, throttle_min=0.2
- **Test result:** Drove full 2000 steps, 2013 reward. User: "driving very well, stayed in right-hand lane, very very good" - **Reward:** v4 (base × efficiency × speed_bonus)
- **Other tracks:** Confirmed fails on generated_track (old multitrack_eval) - **Performance:** Drives generated_track reliably, clean laps
- **Zero-shot:** Fails on mountain_track (expected — single-track specialist)
### ✅ Wave 4 Trial 9 — generated_track AND mini_monaco ### ✅ exp14-mountain-v5-finetune ft_036k — mountain specialist
- **Path:** `models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
- **Trained on:** mountain_track, fine-tuned from exp14 base, checkpoint at 36k steps
- **Reward:** v5 (speed × CTE-quality), throttle_floor=0.2 (switched from 0.4 at 30k)
- **Performance:** 9/9 successful episodes, 25 total laps, mean lap 27.93s, best lap 26.16s
- **Zero-shot:** Fails on generated_track (expected — single-track specialist)
### ⭐ Wave 4 Trial 9 — best generalising model (but not reproducible)
- **Path:** `models/wave4-trial-0009/model.zip` - **Path:** `models/wave4-trial-0009/model.zip`
- **Trained on:** generated_track + mountain_track from scratch, ~90k steps, lr=0.000725, switch=6,851 - **Trained on:** generated_track + mountain_track, ~90k steps, lr=0.000725, switch=6,851
- **Test on generated_track:** 3/3 episodes drove full 2000 steps, 1316 second genuine laps - **Performance:** generated_track 2000/2000, mini_monaco 2000/2000 (zero-shot)
- **Test on mini_monaco:** Full 2000 steps, 40-second genuine laps (zero-shot — never seen during training) - **Problem:** Same hyperparameters repeated multiple times → all failed. This was a lucky random seed.
- **This is our best model**
### ✅ Wave 4 Trial 19 — generated_track (mostly)
- **Path:** `models/wave4-trial-0019/model.zip`
- **Trained on:** generated_track + mountain_track from scratch, ~74k steps, lr=0.000629, switch=8,211
- **Test on generated_track:** 2/3 episodes drove full 2000 steps, 1417 second genuine laps. 1 crash.
- **mini_monaco score during training:** 231 (best "honest" result from Wave 4)
--- ---
## Key Finding: Generated Track Lighting Variation ## What We Know (cumulative)
The generated_track changes lighting conditions (sun angle, shadows) on every
env.reset() due to procedural generation. This means during training, every ### Reward functions
episode showed a different visual appearance of the same track. The model was - **v4** (base × efficiency × speed_bonus): works for generated_track; gives zero gradient on mountain hills
forced to learn track-geometry features (road edges, markings) rather than - **v5** (speed × CTE-quality): works for mountain; circular driving exploit possible on flat track
lighting-specific patterns. This visual robustness is almost certainly why - **v6** (v5 + efficiency gate ≥ 0.15): prevents circular exploit; may suppress early exploration
Trial 9 can zero-shot generalise to mini_monaco.
### Training approaches tried and their outcomes
| Approach | Result |
|---|---|
| Single-track PPO (Exp 9, 13) | ✅ Reliable. Best per-track performance. |
| Round-robin close-and-switch (Wave 4, Exp 10) | ❌ 80% failure rate. Disrupts PPO rollout buffer. |
| Parallel DummyVecEnv 90k steps (Exp 11b) | ⚠️ Infrastructure works; 90k too few steps (194 steps on all tracks). |
| Cross-track warm start both directions (Exp 15, 16) | ❌ Both failed. Single-track policies too specialised for naive transfer. |
### Mountain track physics (fixed 2026-04-27)
The mountain_track.unity scene assigned Slippery physics material (staticFriction=0.1)
to 4 track surface colliders. WheelPhys.cs scales wheel grip by surface staticFriction,
so the car had 1/5 normal grip on the hill. This caused visible wheelspin.
Fixed by assigning Road material (staticFriction=0.5) to those 4 colliders in
`sdsim/Assets/Scenes/mountain_track.unity`. The project uses a pre-built Windows
executable (DonkeySimWin/donkey_sim.exe), so this fix is deferred until the sim
is rebuilt from source in Unity Editor. Proceed with Exp 17 using the existing binary.
### Key parameter knowledge
- **lr:** 0.000725 (from Trial 9 and Exp 9 — consistent with good results)
- **throttle_min:** 0.2 (v5/v6 reward gives non-zero gradient on hills even at 0.2)
- **n_steer/n_throttle:** Relevant for discrete action space only (PPO uses continuous)
- **Per-env throttle_min in DummyVecEnv:** Feasible — each env wrapped independently
--- ---
## Full Test Results — April 16 ## Open Strategy (as of April 27)
| Test | Model | Track | Laps | Steps | Verdict | The goal is reliable multi-track generalisation. The validated path forward:
|---|---|---|---|---|---|
| 1 | Phase 2 champion | generated_road | n/a (not a loop) | 2000/2000 | ✅ DRIVES |
| 2 | Wave 4 Trial 3 | generated_track | — | — | ❌ MODEL CORRUPTED |
| 3 | Wave 4 Trial 9 | generated_track | 6 laps × 3 eps | 2000/2000 | ✅ DRIVES |
| 4 | Wave 4 Trial 9 | mini_monaco | 2 laps per ep | 2000/2000 | ✅ DRIVES (zero-shot) |
| 5 | Wave 4 Trial 14 | mini_monaco | 1 lap ep2 only | 257/901/253 | ⚠️ INCONSISTENT |
| 6 | Wave 4 Trial 25 | mini_monaco | 0 | ~147/eps | ❌ CRASHES |
| + | Wave 4 Trial 19 | generated_track | 5-6 laps × 2 eps | crash/2000/2000 | ✅ MOSTLY |
| + | Wave 4 Trial 22 | generated_track | 0 | ~110/eps | ❌ SAME SPOT |
| + | Wave 4 Trial 2 | generated_track | 0 | ~76/eps | ❌ CRASHES |
| + | Trial 3 (recovered) | generated_track | 0 | ~104/eps | ❌ CRASHES |
--- 1. **Exp 17:** Parallel DummyVecEnv with 400k500k steps
- Two sim instances: generated_track:9091, mountain_track:9093
- v6 reward on both (efficiency gate + CTE patience terminator)
- throttle_min=0.2 both envs (or optionally 0.5 on mountain, 0.2 on generated)
- lr=0.000725, checkpoint every 20k, best_model tracked throughout
- Eval mini_monaco zero-shot at every checkpoint
3. **If Exp 17 plateaus:** Try curriculum (generated_track only for 150k, then add mountain)
4. **If still stuck:** Tune v6 efficiency gate threshold (check % steps gated in early training)
## What We Know Now See `docs/TEST_HISTORY.md` for full Exp 17 design.
1. **Trial 9 is a genuine multi-track model.** It drives generated_track
consistently (3/3) with clean laps, AND generalises zero-shot to
mini_monaco (never seen in training). This is real progress.
2. **The "amazing" overnight model (Trial 3) is lost.** The model.zip has
a corrupted optimizer file. Policy weights were recovered but the model
crashes at ~104 steps — the "amazing" driving was at an intermediate
training checkpoint, not the final saved model.
3. **Most Wave 4 high scores were not exploits — they were real.**
Trials 5, 6, and 14 showed inconsistent results (crash some episodes,
complete lap on others). The model was genuinely learning but unreliably.
Only Trial 14 and 25's original very high scores (1573, 1543) appear
to have been exploits in the original training eval.
4. **Lighting variation on generated_track is a feature, not a bug.**
Procedural generation changes sun angle / shadows each episode, forcing
the model to learn geometry rather than appearance. This may be the key
to Trial 9's generalisation ability.
5. **Mountain_track training — unknown contribution.** We don't know if
mountain_track training helped or hurt. Trial 9 drives generated_track
and mini_monaco; whether it can drive mountain_track is untested.
---
## Open Questions for Strategy Discussion
1. Can Trial 9 also drive mountain_track? (untested)
2. Can Trial 9 drive generated_road? (untested — zero-shot to Phase 2 training track)
3. Why does Trial 9 drive mini_monaco but other models with similar
mini_monaco scores (Trial 14: 193, Trial 22: 193) don't reliably?
4. Would more training steps from Trial 9's hyperparameters produce
an even better model?
5. Is mountain_track necessary, or could we get Trial 9's results
training on generated_track alone?
--- ---
@ -102,9 +85,9 @@ Trial 9 can zero-shot generalise to mini_monaco.
| Model | Path | Status | | Model | Path | Status |
|---|---|---| |---|---|---|
| Phase 2 champion | models/champion/model.zip | ✅ Good | | exp13-gentrack-v4 | models/exp13-gentrack-v4/best_model.zip | ✅ Generated_track specialist |
| Wave 4 Trial 9 | models/wave4-trial-0009/model.zip | ✅ Best model | | exp14-mountain-v5-finetune ft_036k | models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip | ✅ Mountain specialist (best overall mountain model) |
| Wave 4 Trial 19 | models/wave4-trial-0019/model.zip | ✅ Good | | exp14-mountain-v5 | models/exp14-mountain-v5/best_model.zip | ✅ Mountain base (good, slightly worse than ft_036k) |
| Wave 4 Trial 3 | models/wave4-trial-0003/model.zip | ❌ Corrupted | | Wave 4 Trial 9 | models/wave4-trial-0009/model.zip | ✅ Best generalising model; unreproducible |
| Wave 4 Trials 1,2,5-8,10-25 | models/wave4-trial-XXXX/ | Available, mostly crash on generated_track | | Phase 2 champion | models/champion/model.zip | ✅ generated_road specialist only |
| Wave 4 other trials | models/wave4-trial-XXXX/ | Mostly crash on all tracks |

View File

@ -508,3 +508,105 @@ For now:
- keep the single-track champions as separate specialists - keep the single-track champions as separate specialists
- do **not** assume direct cross-track warm starts are beneficial - do **not** assume direct cross-track warm starts are beneficial
---
## Mountain Track Friction Fix (2026-04-27)
### Root cause
`WheelPhys.cs` scales wheel grip by the static friction of whatever surface the
wheel is touching: `fFriction.stiffness = hit.collider.material.staticFriction * originalForwardStiffness`.
`mountain_track.unity` assigned the Slippery physics material (staticFriction=0.1)
to 4 track surface colliders from the `long_road` prefab. This gave the car 1/5
the normal grip on the hill, causing visible wheelspin even at full throttle.
The Slippery material is intentional on genuinely icy surfaces (thunderhill) but
was incorrect on mountain_track's asphalt hill.
### Fix applied
Replaced all 4 Slippery material assignments with Road material (staticFriction=0.5)
in `sdsim/Assets/Scenes/mountain_track.unity`.
| Material | staticFriction | GUID |
|---|---|---|
| Slippery (removed) | 0.1 | c0e12c099c364af4e9e311a43d0f12c4 |
| Road (applied) | 0.5 | 7884193b0ead347a38a13a67f294dfb5 |
### To activate
The training setup uses the pre-built Windows executable (`DonkeySimWin/donkey_sim.exe`),
not a locally-compiled build. The scene file edit in sdsandbox/ has no effect on the
running binary — it only matters if the sim is ever rebuilt from source in Unity Editor.
**This fix is deferred.** Proceed with Exp 17 using the existing executable.
If mountain hill training in Exp 17 specifically struggles (short episodes that plateau
and never improve), that is the signal to pursue a Unity Editor rebuild.
The scene file change is committed in sdsandbox/ and will apply automatically if the
sim is rebuilt for any other reason. No Python code changes needed.
### Expected effect
- Hill wheelspin should stop or greatly reduce
- Throttle_min=0.2 + v5 reward should be even more effective on the hill
- All future mountain experiments benefit; no code changes needed
---
## Strategy Review and Exp 17 Plan (2026-04-27)
### Where the project stands
After 16 experiments and 4 autoresearch phases, the core problem is clear:
multi-track training is needed for generalisation, but the training method has
been unreliable. Here is the summary of what each approach found:
| Approach | Outcome |
|---|---|
| Round-robin close-and-switch (Wave 4, Exp 10) | 80% failure. PPO rollout buffer disrupted on env swap. Lucky seed (Trial 9) worked once but cannot be reproduced. |
| Parallel DummyVecEnv 90k steps (Exp 11b) | Infrastructure valid, no catastrophic forgetting, but 90k steps / 2 tracks = ~45k effective per track. Not enough. |
| Cross-track warm starts (Exp 15, 16) | Both directions failed. Single-track specialists do not transfer cleanly. |
| Single-track PPO (Exp 9, 13, 14) | Reliable but no generalisation. |
The conclusion: **parallel DummyVecEnv is the right architecture; the only known
failure mode is training budget**. Exp 11b was mechanically sound but starved of steps.
### Exp 17 — Parallel DummyVecEnv, 400k500k steps
**This is the primary next experiment.**
| Parameter | Value | Reason |
|---|---|---|
| Architecture | DummyVecEnv([generated_track:9091, mountain_track:9093]) | Validated in Exp 11b; no PPO disruption |
| Total timesteps | 400,000500,000 | ~200k effective per track; Exp 11b proved 90k insufficient |
| Reward | v6 on both envs (efficiency gate + CTE patience terminator) | Blocks circular exploit on generated_track; gate threshold may be tuned |
| throttle_min | 0.2 both envs (or 0.5 mountain, 0.2 generated — see ADR-020) | v5/v6 gradient non-zero on hills at 0.2 |
| learning_rate | 0.000725 | From Trial 9 and Exp 9 — consistent with best results |
| Checkpoint | every 20,000 steps + best_model.zip tracked throughout | ADR-017: best model ≠ final model |
| Eval | mini_monaco zero-shot at every checkpoint | Detect the peak before policy drifts |
| Warm start | None — train from random weights | ADR-024: cross-track warm starts failed |
**Setup checklist before running:**
1. Two sim instances running: one on port 9091, one on port 9093
2. Both on the same track as configured (generated_track and mountain_track)
3. Rebuild simulator with mountain friction fix active
4. Verify throughput: run 2-minute timing benchmark, set step cap accordingly (ADR-014)
**Success criterion:** mini_monaco zero-shot score > 500 (at least 25% of a full
2000-step episode) reliably across 3 evaluation sets, reproducible across 2+ runs.
### Fallback: Curriculum training (if Exp 17 plateaus below 200)
If Exp 17 cannot get past ~200 steps on mini_monaco:
- Phase A: generated_track only, 150k steps (establish road-following)
- Phase B: add mountain_track to DummyVecEnv, continue 250k more steps
- Rationale: gives the policy a foundation before the harder mountain physics
### Fallback: v6 efficiency gate tuning (if gate is too aggressive)
Log what fraction of steps are gated (reward zeroed) in the first 100k steps.
If >40%, lower the gate threshold from 0.15 to 0.10 for the first 150k steps,
then raise it back to 0.15. Prevents the gate from suppressing early exploration.