feat: add exp17 parallel DummyVecEnv 450k training + strategy docs
- exp17_parallel_450k.py: parallel two-track training (generated_track:9091, mountain_track:9093), 450k steps, v6 reward, HOST=localhost - DECISIONS.md: ADR-025 (parallel strategy) and ADR-026 (mountain friction fix) - docs/STATE.md: updated to April 2026 state with current champions and strategy - docs/TEST_HISTORY.md: mountain friction fix notes + Exp 17 full design - outerloop-results: exp14 finetune logs and robust mountain eval results Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
6e2427571a
commit
b504b89b2a
76
DECISIONS.md
76
DECISIONS.md
|
|
@ -576,3 +576,79 @@ experts, not as obviously reusable initializations for the other track.
|
||||||
- If transfer is revisited, it likely needs a more careful method than naive direct
|
- If transfer is revisited, it likely needs a more careful method than naive direct
|
||||||
warm-starting on the other track
|
warm-starting on the other track
|
||||||
- Mountain physics issues should be addressed before revisiting transfer conclusions
|
- Mountain physics issues should be addressed before revisiting transfer conclusions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ADR-025: Parallel DummyVecEnv with 400k+ Steps is the Primary Multi-Track Strategy
|
||||||
|
|
||||||
|
**Date:** 2026-04-27
|
||||||
|
**Status:** Active
|
||||||
|
|
||||||
|
**Context:** After Wave 4 (25 trials, 80% failure rate), Exp 10 (catastrophic forgetting),
|
||||||
|
Exp 11b (infrastructure works but 90k steps insufficient), and Exp 15/16 (cross-track
|
||||||
|
warm starts failed both directions), the only multi-track approach that did not have a
|
||||||
|
fundamental flaw was parallel DummyVecEnv — Exp 11b failed only because the training
|
||||||
|
budget was halved relative to what single-track training needs.
|
||||||
|
|
||||||
|
**Decision:** The primary next strategy is:
|
||||||
|
1. Two sim instances (one per training track, separate ports)
|
||||||
|
2. SB3 `DummyVecEnv([env_generated, env_mountain])` — PPO sees both tracks in every batch
|
||||||
|
3. 400,000–500,000 total timesteps (~200k effective per track)
|
||||||
|
4. v6 reward (efficiency gate + CTE patience terminator) on both envs
|
||||||
|
5. No warm start — train from random weights
|
||||||
|
6. Checkpoint every 20k steps, track mini_monaco zero-shot score throughout
|
||||||
|
|
||||||
|
**Why parallel DummyVecEnv:**
|
||||||
|
- PPO is an on-policy algorithm that depends on a stable rollout buffer.
|
||||||
|
Swapping environments mid-training disrupts value estimates and causes catastrophic forgetting.
|
||||||
|
DummyVecEnv feeds both tracks into every PPO rollout batch — no forgetting, no disruption.
|
||||||
|
- This is how SB3 was designed to be used with multiple environments.
|
||||||
|
|
||||||
|
**Why 400k+ steps:**
|
||||||
|
- Single-track training converges in ~60–90k steps.
|
||||||
|
- Two parallel tracks need at least 2× the budget because each track gets half the gradient.
|
||||||
|
Interference between the two tasks adds further overhead.
|
||||||
|
- Exp 11b at 90k steps (effectively 45k per track) produced only 194-step drives on both tracks.
|
||||||
|
400k should provide adequate budget for both.
|
||||||
|
|
||||||
|
**Rejected alternatives:**
|
||||||
|
- Round-robin close-and-switch: disrupts PPO, 80% failure rate across 25 trials
|
||||||
|
- Cross-track warm starts: failed both directions (ADR-024)
|
||||||
|
- More autoresearch trials on round-robin: the method is fundamentally unreliable
|
||||||
|
|
||||||
|
**Fallback if 400k parallel fails:** Curriculum — train generated_track alone for 150k steps,
|
||||||
|
then add mountain to the DummyVecEnv pool for 250k more steps.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ADR-026: Mountain Track Friction Fix — Use Road Material on Hill Colliders
|
||||||
|
|
||||||
|
**Date:** 2026-04-27
|
||||||
|
**Status:** Accepted — fix applied
|
||||||
|
|
||||||
|
**Context:** `WheelPhys.cs` multiplies wheel grip stiffness by the static friction of the
|
||||||
|
surface the wheel is hitting. The mountain_track scene assigned Slippery physics material
|
||||||
|
(staticFriction=0.1) to 4 track surface colliders from the long_road prefab, giving the
|
||||||
|
car 1/5 the normal traction on the hill. This caused visible wheelspin at full throttle and
|
||||||
|
made hill climbing genuinely difficult for learned policies.
|
||||||
|
|
||||||
|
**Decision:** Replace the 4 Slippery material assignments in `mountain_track.unity` with the
|
||||||
|
Road material (staticFriction=0.5). This is a targeted scene-level override; the Slippery
|
||||||
|
material asset itself is unchanged and remains available for intentionally slippery surfaces.
|
||||||
|
|
||||||
|
**Fix location:** `sdsim/Assets/Scenes/mountain_track.unity` — all 4 PrefabModification
|
||||||
|
entries that set `propertyPath: m_Material` on long_road colliders now reference Road
|
||||||
|
(GUID 7884193b0ead347a38a13a67f294dfb5) instead of Slippery (GUID c0e12c099c364af4e9e311a43d0f12c4).
|
||||||
|
|
||||||
|
**To activate:** Rebuild the Unity simulator binary after pulling the updated scene file.
|
||||||
|
No Python code changes needed.
|
||||||
|
|
||||||
|
**What this does NOT change:**
|
||||||
|
- `Slippery.physicMaterial` asset — unchanged (still used by thunderhill, circuit_launch)
|
||||||
|
- `Donkey_new_phys.prefab` strut colliders — also reference Slippery, but these are car body
|
||||||
|
parts that the wheels don't touch. WheelPhys.cs only reads friction from ground hits.
|
||||||
|
- mini_monaco.unity — also has one Slippery reference; left intentional for now
|
||||||
|
|
||||||
|
**Expected effect:** Hill wheelspin should stop. The policy should find it easier to climb
|
||||||
|
the hill at throttle_min=0.2, and Exp 17 multi-track results should be more interpretable
|
||||||
|
since we are no longer fighting a physics artifact.
|
||||||
|
|
|
||||||
|
|
@ -5,6 +5,7 @@ Each corresponds to an entry in docs/TEST_HISTORY.md.
|
||||||
|
|
||||||
| Script | Experiment | Key change |
|
| Script | Experiment | Key change |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
|
| exp17_parallel_450k.py | Exp 17 | Parallel DummyVecEnv, 450k steps, v6 reward, HOST=localhost |
|
||||||
| mountain_v5.py | Exp 5 | v5 reward + throttle_min=0.5, direct model.learn() |
|
| mountain_v5.py | Exp 5 | v5 reward + throttle_min=0.5, direct model.learn() |
|
||||||
| mountain_continue.py | Exp 4 | Continued Exp3 training |
|
| mountain_continue.py | Exp 4 | Continued Exp3 training |
|
||||||
| mountain_high_throttle.py | Exp 3 | throttle_min=0.5, old v4 reward |
|
| mountain_high_throttle.py | Exp 3 | throttle_min=0.5, old v4 reward |
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,199 @@
|
||||||
|
"""
|
||||||
|
Exp 17: Parallel DummyVecEnv — generated_track + mountain_track, 450k steps.
|
||||||
|
|
||||||
|
Strategy: Exp 11b proved the parallel DummyVecEnv infrastructure is stable.
|
||||||
|
The only failure mode was insufficient training budget (~45k effective steps
|
||||||
|
per track). This experiment triples the budget to ~225k per track.
|
||||||
|
|
||||||
|
Changes from Exp 11b:
|
||||||
|
- HOST: 10.0.0.55 → localhost (WSL/Windows share ports)
|
||||||
|
- TOTAL_STEPS: 90k → 450k
|
||||||
|
- CHECKPOINT_EVERY: 6k → 20k
|
||||||
|
- SAVE_DIR: exp17-parallel-450k
|
||||||
|
|
||||||
|
Everything else identical to Exp 11b (same reward, wrappers, lr, throttle_min).
|
||||||
|
|
||||||
|
Setup — TWO sim instances required:
|
||||||
|
Sim 1: launch donkey_sim.exe, select generated_track, port 9091 (default)
|
||||||
|
Sim 2: launch a second donkey_sim.exe with --port 9093, select mountain_track
|
||||||
|
Command: donkey_sim.exe --port 9093
|
||||||
|
|
||||||
|
Both sims must be running and on the correct tracks before starting this script.
|
||||||
|
|
||||||
|
Evaluation:
|
||||||
|
- Mid-training: both training tracks evaluated at each 20k checkpoint
|
||||||
|
- End-of-training: all 4 tracks evaluated sequentially (port 9091)
|
||||||
|
"""
|
||||||
|
import sys, os, time
|
||||||
|
sys.path.insert(0, '/home/paulh/projects/donkeycar-rl-autoresearch/agent')
|
||||||
|
|
||||||
|
from multitrack_runner import log, StuckTerminationWrapper
|
||||||
|
from donkeycar_sb3_runner import ThrottleClampWrapper
|
||||||
|
from reward_wrapper import SpeedRewardWrapper
|
||||||
|
from stable_baselines3 import PPO
|
||||||
|
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage
|
||||||
|
import gymnasium as gym
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
HOST = 'localhost'
|
||||||
|
THROTTLE_MIN = 0.2
|
||||||
|
LR = 0.000725
|
||||||
|
TOTAL_STEPS = 450_000
|
||||||
|
CHECKPOINT_EVERY = 20_000
|
||||||
|
SAVE_DIR = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp17-parallel-450k'
|
||||||
|
os.makedirs(SAVE_DIR, exist_ok=True)
|
||||||
|
|
||||||
|
|
||||||
|
def make_env(track_id, port):
|
||||||
|
def _init():
|
||||||
|
raw = gym.make(track_id, conf={'host': HOST, 'port': port})
|
||||||
|
env = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
|
||||||
|
env = StuckTerminationWrapper(env, stuck_steps=40, min_displacement=0.5)
|
||||||
|
env = SpeedRewardWrapper(env)
|
||||||
|
return env
|
||||||
|
return _init
|
||||||
|
|
||||||
|
|
||||||
|
log('=' * 60)
|
||||||
|
log('Exp 17: Parallel DummyVecEnv — 450k steps')
|
||||||
|
log(f' Sim 1: {HOST}:9091 → generated_track')
|
||||||
|
log(f' Sim 2: {HOST}:9093 → mountain_track')
|
||||||
|
log(f' throttle_min={THROTTLE_MIN}, lr={LR}, total={TOTAL_STEPS:,}')
|
||||||
|
log(f' Reward: v6 (speed × CTE_quality, efficiency gate >= 0.15)')
|
||||||
|
log(f' Stuck termination: 40 steps (~2.5s)')
|
||||||
|
log(f' Checkpoints: every {CHECKPOINT_EVERY:,} steps')
|
||||||
|
log('=' * 60)
|
||||||
|
|
||||||
|
log('Creating DummyVecEnv with two tracks...')
|
||||||
|
env = DummyVecEnv([
|
||||||
|
make_env('donkey-generated-track-v0', 9091),
|
||||||
|
make_env('donkey-mountain-track-v0', 9093),
|
||||||
|
])
|
||||||
|
env = VecTransposeImage(env)
|
||||||
|
log(f' VecEnv num_envs={env.num_envs}, obs={env.observation_space.shape}')
|
||||||
|
|
||||||
|
model = PPO('CnnPolicy', env, learning_rate=LR, verbose=1, device='cpu')
|
||||||
|
log('PPO created. Starting training...')
|
||||||
|
|
||||||
|
best_reward = float('-inf')
|
||||||
|
steps_done = 0
|
||||||
|
|
||||||
|
while steps_done < TOTAL_STEPS:
|
||||||
|
seg_steps = min(CHECKPOINT_EVERY, TOTAL_STEPS - steps_done)
|
||||||
|
model.learn(total_timesteps=seg_steps, reset_num_timesteps=False)
|
||||||
|
steps_done += seg_steps
|
||||||
|
|
||||||
|
ckpt = os.path.join(SAVE_DIR, f'checkpoint_{steps_done:07d}')
|
||||||
|
model.save(ckpt)
|
||||||
|
model.save(os.path.join(SAVE_DIR, 'model'))
|
||||||
|
log(f'[{steps_done:,}/{TOTAL_STEPS:,}] Checkpoint saved: {ckpt}.zip')
|
||||||
|
|
||||||
|
# Eval on both training tracks using the existing DummyVecEnv connections
|
||||||
|
try:
|
||||||
|
obs = env.reset()
|
||||||
|
ep_rewards = np.zeros(env.num_envs)
|
||||||
|
ep_steps = np.zeros(env.num_envs)
|
||||||
|
done_mask = np.zeros(env.num_envs, dtype=bool)
|
||||||
|
for _ in range(2000):
|
||||||
|
action, _ = model.predict(obs, deterministic=True)
|
||||||
|
obs, rewards, dones, infos = env.step(action)
|
||||||
|
for i in range(env.num_envs):
|
||||||
|
if not done_mask[i]:
|
||||||
|
ep_rewards[i] += rewards[i]
|
||||||
|
ep_steps[i] += 1
|
||||||
|
if dones[i]:
|
||||||
|
done_mask[i] = True
|
||||||
|
if done_mask.all():
|
||||||
|
break
|
||||||
|
|
||||||
|
status0 = '✅' if ep_steps[0] >= 2000 else f'❌@{int(ep_steps[0])}'
|
||||||
|
status1 = '✅' if ep_steps[1] >= 2000 else f'❌@{int(ep_steps[1])}'
|
||||||
|
log(f' Eval: gen_track={ep_rewards[0]:.1f}r/{int(ep_steps[0])}s {status0} '
|
||||||
|
f'mountain={ep_rewards[1]:.1f}r/{int(ep_steps[1])}s {status1}')
|
||||||
|
|
||||||
|
total_reward = ep_rewards.sum()
|
||||||
|
if total_reward > best_reward:
|
||||||
|
best_reward = total_reward
|
||||||
|
model.save(os.path.join(SAVE_DIR, 'best_model'))
|
||||||
|
log(f' ⭐ NEW BEST: {best_reward:.1f} combined reward')
|
||||||
|
except Exception as e:
|
||||||
|
log(f' Eval error: {e}')
|
||||||
|
import traceback; traceback.print_exc()
|
||||||
|
|
||||||
|
model.save(os.path.join(SAVE_DIR, 'model'))
|
||||||
|
log(f'\nTraining complete. Best combined reward: {best_reward:.1f}')
|
||||||
|
|
||||||
|
env.close()
|
||||||
|
time.sleep(5)
|
||||||
|
|
||||||
|
# --- Final eval on all 4 tracks (sequential, port 9091) ---
|
||||||
|
log('\n' + '=' * 60)
|
||||||
|
log('FINAL EVALUATION: best_model on 4 tracks (3 sets each)')
|
||||||
|
log('=' * 60)
|
||||||
|
|
||||||
|
EVAL_TRACKS = [
|
||||||
|
('donkey-generated-track-v0', 'generated_track'),
|
||||||
|
('donkey-mountain-track-v0', 'mountain_track'),
|
||||||
|
('donkey-minimonaco-track-v0', 'mini_monaco'),
|
||||||
|
('donkey-generated-roads-v0', 'generated_road'),
|
||||||
|
]
|
||||||
|
EVAL_PORT = 9091
|
||||||
|
EVAL_SETS = 3
|
||||||
|
EVAL_MAX_STEPS = 2000
|
||||||
|
|
||||||
|
best_model_path = os.path.join(SAVE_DIR, 'best_model.zip')
|
||||||
|
results_by_track = {}
|
||||||
|
|
||||||
|
for track_id, track_name in EVAL_TRACKS:
|
||||||
|
log(f'\n--- {track_name} ---')
|
||||||
|
steps_list = []
|
||||||
|
|
||||||
|
for s in range(1, EVAL_SETS + 1):
|
||||||
|
try:
|
||||||
|
raw = gym.make(track_id, conf={'host': HOST, 'port': EVAL_PORT})
|
||||||
|
inner = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
|
||||||
|
inner = StuckTerminationWrapper(inner, stuck_steps=40, min_displacement=0.5)
|
||||||
|
inner = SpeedRewardWrapper(inner)
|
||||||
|
eval_env = VecTransposeImage(DummyVecEnv([lambda e=inner: e]))
|
||||||
|
|
||||||
|
eval_model = PPO.load(best_model_path, env=eval_env, device='cpu')
|
||||||
|
|
||||||
|
obs = eval_env.reset()
|
||||||
|
total_r, total_s, done = 0.0, 0, False
|
||||||
|
while not done and total_s < EVAL_MAX_STEPS:
|
||||||
|
action, _ = eval_model.predict(obs, deterministic=True)
|
||||||
|
result = eval_env.step(action)
|
||||||
|
if len(result) == 4:
|
||||||
|
obs, r, d, info = result
|
||||||
|
done = bool(d[0])
|
||||||
|
else:
|
||||||
|
obs, r, t, tr, info = result
|
||||||
|
done = bool(t[0] or tr[0])
|
||||||
|
total_r += float(r[0])
|
||||||
|
total_s += 1
|
||||||
|
|
||||||
|
status = '✅' if total_s >= EVAL_MAX_STEPS else f'❌@{total_s}'
|
||||||
|
log(f' Set {s}: {total_r:.1f}r / {total_s}s {status}')
|
||||||
|
steps_list.append(total_s)
|
||||||
|
|
||||||
|
eval_env.close()
|
||||||
|
time.sleep(3)
|
||||||
|
except Exception as e:
|
||||||
|
log(f' Set {s}: ERROR — {e}')
|
||||||
|
steps_list.append(0)
|
||||||
|
time.sleep(3)
|
||||||
|
|
||||||
|
mean_steps = np.mean(steps_list) if steps_list else 0
|
||||||
|
results_by_track[track_name] = steps_list
|
||||||
|
log(f' Mean: {mean_steps:.0f} steps')
|
||||||
|
|
||||||
|
log('\n' + '=' * 60)
|
||||||
|
log('SUMMARY')
|
||||||
|
log('=' * 60)
|
||||||
|
for track_name, steps_list in results_by_track.items():
|
||||||
|
steps_str = '/'.join(str(s) for s in steps_list)
|
||||||
|
mean = np.mean(steps_list)
|
||||||
|
verdict = '✅' if mean >= 1500 else '⚠️' if mean >= 500 else '❌'
|
||||||
|
log(f' {verdict} {track_name:20s}: {steps_str} mean={mean:.0f}')
|
||||||
|
|
||||||
|
log(f'\n=== Exp 17 COMPLETE ===')
|
||||||
|
|
@ -0,0 +1,61 @@
|
||||||
|
2026-04-20T00:08:21.090963 Loading warm-start model from models/exp14-mountain-v5/best_model.zip
|
||||||
|
2026-04-20T00:09:16.674927 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using throttle_min=0.2 env
|
||||||
|
2026-04-20T00:09:19.055092 Switching model to env with throttle_min=0.4
|
||||||
|
2026-04-20T00:10:27.385278 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env
|
||||||
|
2026-04-20T00:11:08.699368 ERROR during fine-tune: 'NoneType' object is not callable
|
||||||
|
2026-04-20T00:11:08.901669 Fine-tune complete. steps_done=0
|
||||||
|
2026-04-20T00:14:43.472139 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env
|
||||||
|
2026-04-20T00:17:44.473941 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env
|
||||||
|
2026-04-20T00:21:10.924456 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env
|
||||||
|
2026-04-20T00:25:31.932947 Loading warm-start model from models/exp14-mountain-v5/best_model.zip without creating a temp env
|
||||||
|
2026-04-20T00:28:59.848890 [6000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0006000.zip
|
||||||
|
2026-04-20T00:28:59.848966 ERROR during fine-tune: name 'make_env' is not defined
|
||||||
|
2026-04-20T00:29:00.509181 Fine-tune complete. steps_done=6000
|
||||||
|
2026-04-20T00:31:09.594830 Loading warm-start model from models/exp14-mountain-v5/best_model.zip without creating a temp env
|
||||||
|
2026-04-20T00:34:50.056288 [6000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0006000.zip
|
||||||
|
2026-04-20T00:35:04.415348 ERROR during fine-tune: name 'json' is not defined
|
||||||
|
2026-04-20T00:35:04.546033 Fine-tune complete. steps_done=6000
|
||||||
|
2026-04-20T00:37:47.831240 Loading warm-start model from models/exp14-mountain-v5/best_model.zip without creating a temp env
|
||||||
|
2026-04-20T00:41:21.675776 [6000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0006000.zip
|
||||||
|
2026-04-20T00:41:43.554021 Eval @ 6000: mean_steps=384.7 mean_lap=21.59375
|
||||||
|
2026-04-20T00:41:43.694831 ⭐ NEW BEST (mean lap 21.59s) saved
|
||||||
|
2026-04-20T00:45:26.980198 [12000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0012000.zip
|
||||||
|
2026-04-20T00:45:42.741989 Eval @ 12000: mean_steps=187.7 mean_lap=None
|
||||||
|
2026-04-20T00:49:24.586893 [18000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0018000.zip
|
||||||
|
2026-04-20T00:49:42.795830 Eval @ 18000: mean_steps=287.3 mean_lap=None
|
||||||
|
2026-04-20T00:53:15.614884 [24000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0024000.zip
|
||||||
|
2026-04-20T00:53:37.070339 Eval @ 24000: mean_steps=374.7 mean_lap=21.765625
|
||||||
|
2026-04-20T00:57:09.352148 [30000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0030000.zip
|
||||||
|
2026-04-20T00:57:36.938090 Eval @ 30000: mean_steps=537.7 mean_lap=22.046875
|
||||||
|
2026-04-20T00:57:36.938120 Switching env to throttle_min=0.2
|
||||||
|
2026-04-20T01:00:55.914640 [36000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0036000.zip
|
||||||
|
2026-04-20T01:01:56.665949 Eval @ 36000: mean_steps=1451.7 mean_lap=28.434895833333332
|
||||||
|
2026-04-20T01:05:10.807288 [42000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0042000.zip
|
||||||
|
2026-04-20T01:05:57.449632 Eval @ 42000: mean_steps=1067.7 mean_lap=27.44140625
|
||||||
|
2026-04-20T01:08:54.843851 [48000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0048000.zip
|
||||||
|
2026-04-20T01:10:00.878424 Eval @ 48000: mean_steps=1626.7 mean_lap=29.776785714285715
|
||||||
|
2026-04-20T01:13:16.089861 [54000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0054000.zip
|
||||||
|
2026-04-20T01:14:18.435622 Eval @ 54000: mean_steps=1528.3 mean_lap=30.234375
|
||||||
|
2026-04-20T01:17:25.682859 [60000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0060000.zip
|
||||||
|
2026-04-20T01:18:28.243356 Eval @ 60000: mean_steps=1533.0 mean_lap=34.33125
|
||||||
|
2026-04-20T01:21:38.247436 [66000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0066000.zip
|
||||||
|
2026-04-20T01:21:54.995379 Eval @ 66000: mean_steps=163.7 mean_lap=None
|
||||||
|
2026-04-20T01:25:14.752223 [72000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0072000.zip
|
||||||
|
2026-04-20T01:26:11.926001 Eval @ 72000: mean_steps=1389.7 mean_lap=43.21484375
|
||||||
|
2026-04-20T01:29:24.138321 [78000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0078000.zip
|
||||||
|
2026-04-20T01:29:59.928582 Eval @ 78000: mean_steps=757.0 mean_lap=43.453125
|
||||||
|
2026-04-20T01:33:15.187091 [84000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0084000.zip
|
||||||
|
2026-04-20T01:33:49.188449 Eval @ 84000: mean_steps=704.7 mean_lap=41.046875
|
||||||
|
2026-04-20T01:36:57.554346 [90000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0090000.zip
|
||||||
|
2026-04-20T01:38:12.054640 Eval @ 90000: mean_steps=1819.0 mean_lap=None
|
||||||
|
2026-04-20T01:41:29.620560 [96000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0096000.zip
|
||||||
|
2026-04-20T01:42:07.583154 Eval @ 96000: mean_steps=813.0 mean_lap=None
|
||||||
|
2026-04-20T01:45:23.503967 [102000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0102000.zip
|
||||||
|
2026-04-20T01:45:59.052782 Eval @ 102000: mean_steps=747.3 mean_lap=None
|
||||||
|
2026-04-20T01:49:02.510514 [108000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0108000.zip
|
||||||
|
2026-04-20T01:49:27.462705 Eval @ 108000: mean_steps=466.0 mean_lap=None
|
||||||
|
2026-04-20T01:52:40.338223 [114000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0114000.zip
|
||||||
|
2026-04-20T01:53:31.593848 Eval @ 114000: mean_steps=1169.0 mean_lap=None
|
||||||
|
2026-04-20T01:56:39.035861 [120000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0120000.zip
|
||||||
|
2026-04-20T01:57:28.658996 Eval @ 120000: mean_steps=1125.0 mean_lap=None
|
||||||
|
2026-04-20T01:57:28.795051 Fine-tune complete. steps_done=120000
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
{"steps_done": 6000, "throttle_min": 0.4, "mean_steps": 384.6666666666667, "mean_lap_time": 21.59375, "per_set": [{"steps": 205, "laps": 0, "lap_times": []}, {"steps": 177, "laps": 0, "lap_times": []}, {"steps": 772, "laps": 1, "lap_times": [21.59375]}]}
|
||||||
|
{"steps_done": 12000, "throttle_min": 0.4, "mean_steps": 187.66666666666666, "mean_lap_time": null, "per_set": [{"steps": 145, "laps": 0, "lap_times": []}, {"steps": 345, "laps": 0, "lap_times": []}, {"steps": 73, "laps": 0, "lap_times": []}]}
|
||||||
|
{"steps_done": 18000, "throttle_min": 0.4, "mean_steps": 287.3333333333333, "mean_lap_time": null, "per_set": [{"steps": 233, "laps": 0, "lap_times": []}, {"steps": 244, "laps": 0, "lap_times": []}, {"steps": 385, "laps": 0, "lap_times": []}]}
|
||||||
|
{"steps_done": 24000, "throttle_min": 0.4, "mean_steps": 374.6666666666667, "mean_lap_time": 21.765625, "per_set": [{"steps": 178, "laps": 0, "lap_times": []}, {"steps": 359, "laps": 0, "lap_times": []}, {"steps": 587, "laps": 1, "lap_times": [21.765625]}]}
|
||||||
|
{"steps_done": 30000, "throttle_min": 0.4, "mean_steps": 537.6666666666666, "mean_lap_time": 22.046875, "per_set": [{"steps": 854, "laps": 1, "lap_times": [22.046875]}, {"steps": 365, "laps": 0, "lap_times": []}, {"steps": 394, "laps": 0, "lap_times": []}]}
|
||||||
|
{"steps_done": 36000, "throttle_min": 0.2, "mean_steps": 1451.6666666666667, "mean_lap_time": 28.434895833333332, "per_set": [{"steps": 1540, "laps": 2, "lap_times": [29.34375, 26.84375]}, {"steps": 2000, "laps": 3, "lap_times": [29.4375, 28.4375, 27.015625]}, {"steps": 815, "laps": 1, "lap_times": [29.53125]}]}
|
||||||
|
{"steps_done": 42000, "throttle_min": 0.2, "mean_steps": 1067.6666666666667, "mean_lap_time": 27.44140625, "per_set": [{"steps": 467, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 3, "lap_times": [27.046875, 27.703125, 27.125]}, {"steps": 736, "laps": 1, "lap_times": [27.890625]}]}
|
||||||
|
{"steps_done": 48000, "throttle_min": 0.2, "mean_steps": 1626.6666666666667, "mean_lap_time": 29.776785714285715, "per_set": [{"steps": 2000, "laps": 3, "lap_times": [30.796875, 29.828125, 28.734375]}, {"steps": 880, "laps": 1, "lap_times": [30.65625]}, {"steps": 2000, "laps": 3, "lap_times": [29.703125, 29.203125, 29.515625]}]}
|
||||||
|
{"steps_done": 54000, "throttle_min": 0.2, "mean_steps": 1528.3333333333333, "mean_lap_time": 30.234375, "per_set": [{"steps": 585, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 3, "lap_times": [32.734375, 29.8125, 30.8125]}, {"steps": 2000, "laps": 3, "lap_times": [31.171875, 29.71875, 27.15625]}]}
|
||||||
|
{"steps_done": 60000, "throttle_min": 0.2, "mean_steps": 1533.0, "mean_lap_time": 34.33125, "per_set": [{"steps": 2000, "laps": 2, "lap_times": [39.140625, 33.140625]}, {"steps": 599, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 3, "lap_times": [34.21875, 31.953125, 33.203125]}]}
|
||||||
|
{"steps_done": 66000, "throttle_min": 0.2, "mean_steps": 163.66666666666666, "mean_lap_time": null, "per_set": [{"steps": 154, "laps": 0, "lap_times": []}, {"steps": 146, "laps": 0, "lap_times": []}, {"steps": 191, "laps": 0, "lap_times": []}]}
|
||||||
|
{"steps_done": 72000, "throttle_min": 0.2, "mean_steps": 1389.6666666666667, "mean_lap_time": 43.21484375, "per_set": [{"steps": 2000, "laps": 2, "lap_times": [50.140625, 35.6875]}, {"steps": 169, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 2, "lap_times": [39.890625, 47.140625]}]}
|
||||||
|
{"steps_done": 78000, "throttle_min": 0.2, "mean_steps": 757.0, "mean_lap_time": 43.453125, "per_set": [{"steps": 174, "laps": 0, "lap_times": []}, {"steps": 1074, "laps": 1, "lap_times": [46.03125]}, {"steps": 1023, "laps": 1, "lap_times": [40.875]}]}
|
||||||
|
{"steps_done": 84000, "throttle_min": 0.2, "mean_steps": 704.6666666666666, "mean_lap_time": 41.046875, "per_set": [{"steps": 953, "laps": 1, "lap_times": [40.21875]}, {"steps": 181, "laps": 0, "lap_times": []}, {"steps": 980, "laps": 1, "lap_times": [41.875]}]}
|
||||||
|
{"steps_done": 90000, "throttle_min": 0.2, "mean_steps": 1819.0, "mean_lap_time": null, "per_set": [{"steps": 2000, "laps": 0, "lap_times": []}, {"steps": 1963, "laps": 0, "lap_times": []}, {"steps": 1494, "laps": 0, "lap_times": []}]}
|
||||||
|
{"steps_done": 96000, "throttle_min": 0.2, "mean_steps": 813.0, "mean_lap_time": null, "per_set": [{"steps": 1671, "laps": 0, "lap_times": []}, {"steps": 385, "laps": 0, "lap_times": []}, {"steps": 383, "laps": 0, "lap_times": []}]}
|
||||||
|
{"steps_done": 102000, "throttle_min": 0.2, "mean_steps": 747.3333333333334, "mean_lap_time": null, "per_set": [{"steps": 715, "laps": 0, "lap_times": []}, {"steps": 932, "laps": 0, "lap_times": []}, {"steps": 595, "laps": 0, "lap_times": []}]}
|
||||||
|
{"steps_done": 108000, "throttle_min": 0.2, "mean_steps": 466.0, "mean_lap_time": null, "per_set": [{"steps": 468, "laps": 0, "lap_times": []}, {"steps": 476, "laps": 0, "lap_times": []}, {"steps": 454, "laps": 0, "lap_times": []}]}
|
||||||
|
{"steps_done": 114000, "throttle_min": 0.2, "mean_steps": 1169.0, "mean_lap_time": null, "per_set": [{"steps": 1318, "laps": 0, "lap_times": []}, {"steps": 1278, "laps": 0, "lap_times": []}, {"steps": 911, "laps": 0, "lap_times": []}]}
|
||||||
|
{"steps_done": 120000, "throttle_min": 0.2, "mean_steps": 1125.0, "mean_lap_time": null, "per_set": [{"steps": 941, "laps": 0, "lap_times": []}, {"steps": 1492, "laps": 0, "lap_times": []}, {"steps": 942, "laps": 0, "lap_times": []}]}
|
||||||
|
|
@ -0,0 +1,13 @@
|
||||||
|
{"set": 1, "episode": 1, "steps": 195, "reward": 313.8098858782323, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:55:01.852800"}
|
||||||
|
{"set": 1, "episode": 2, "steps": 907, "reward": 821.3252189619088, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:55:15.580688"}
|
||||||
|
{"set": 1, "episode": 3, "steps": 187, "reward": 312.3699834933941, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:55:20.305057"}
|
||||||
|
{"set_summary": {"set": 1, "mean_steps": 429.6666666666667, "mean_reward": 482.50169611117843}}
|
||||||
|
{"set": 2, "episode": 1, "steps": 1684, "reward": 2886.7210297683996, "laps": 2, "lap_times": [30.796875, 27.3125], "timestamp": "2026-04-19T23:55:43.831212"}
|
||||||
|
{"set": 2, "episode": 2, "steps": 1791, "reward": 2724.1041878786637, "laps": 2, "lap_times": [29.234375, 31.578125], "timestamp": "2026-04-19T23:56:08.736059"}
|
||||||
|
{"set": 2, "episode": 3, "steps": 2000, "reward": 3338.140802157104, "laps": 3, "lap_times": [29.828125, 27.828125, 29.171875], "timestamp": "2026-04-19T23:56:34.963968"}
|
||||||
|
{"set_summary": {"set": 2, "mean_steps": 1825.0, "mean_reward": 2982.9886732680557}}
|
||||||
|
{"set": 3, "episode": 1, "steps": 189, "reward": 304.40264326371107, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:56:39.723007"}
|
||||||
|
{"set": 3, "episode": 2, "steps": 2000, "reward": 3396.2255747133167, "laps": 3, "lap_times": [29.875, 28.75, 27.765625], "timestamp": "2026-04-19T23:57:05.989723"}
|
||||||
|
{"set": 3, "episode": 3, "steps": 773, "reward": 1300.720640436186, "laps": 1, "lap_times": [31.265625], "timestamp": "2026-04-19T23:57:18.198014"}
|
||||||
|
{"set_summary": {"set": 3, "mean_steps": 987.3333333333334, "mean_reward": 1667.116286137738}}
|
||||||
|
{"overall": {"mean_steps_across_sets": 1080.6666666666667, "mean_reward_across_sets": 1710.8688851723239}}
|
||||||
149
docs/STATE.md
149
docs/STATE.md
|
|
@ -1,100 +1,83 @@
|
||||||
# Project State — April 16, 2026 (post-testing)
|
# Project State — April 27, 2026
|
||||||
|
|
||||||
## The Goal
|
## The Goal
|
||||||
|
|
||||||
Train a DonkeyCar model that generalises to any road-surface track
|
Train a DonkeyCar model that generalises to any road-surface track
|
||||||
(outdoor, asphalt, lane markings) — demonstrated by driving a
|
(outdoor, asphalt, lane markings) — demonstrated by driving a
|
||||||
never-seen track without crashing.
|
never-seen track without crashing.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Confirmed Working Models (tested today, observed by user)
|
## Current Champion Models
|
||||||
|
|
||||||
### ✅ Phase 2 Champion — generated_road
|
### ✅ exp13-gentrack-v4 — generated_track specialist
|
||||||
- **Path:** `models/champion/model.zip`
|
- **Path:** `models/exp13-gentrack-v4/best_model.zip`
|
||||||
- **Trained on:** generated_road only, ~13k steps, lr=0.000225
|
- **Trained on:** generated_track only, ~30k steps (stopped early), lr=0.000725, throttle_min=0.2
|
||||||
- **Test result:** Drove full 2000 steps, 2013 reward. User: "driving very well, stayed in right-hand lane, very very good"
|
- **Reward:** v4 (base × efficiency × speed_bonus)
|
||||||
- **Other tracks:** Confirmed fails on generated_track (old multitrack_eval)
|
- **Performance:** Drives generated_track reliably, clean laps
|
||||||
|
- **Zero-shot:** Fails on mountain_track (expected — single-track specialist)
|
||||||
|
|
||||||
### ✅ Wave 4 Trial 9 — generated_track AND mini_monaco
|
### ✅ exp14-mountain-v5-finetune ft_036k — mountain specialist
|
||||||
|
- **Path:** `models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
|
||||||
|
- **Trained on:** mountain_track, fine-tuned from exp14 base, checkpoint at 36k steps
|
||||||
|
- **Reward:** v5 (speed × CTE-quality), throttle_floor=0.2 (switched from 0.4 at 30k)
|
||||||
|
- **Performance:** 9/9 successful episodes, 25 total laps, mean lap 27.93s, best lap 26.16s
|
||||||
|
- **Zero-shot:** Fails on generated_track (expected — single-track specialist)
|
||||||
|
|
||||||
|
### ⭐ Wave 4 Trial 9 — best generalising model (but not reproducible)
|
||||||
- **Path:** `models/wave4-trial-0009/model.zip`
|
- **Path:** `models/wave4-trial-0009/model.zip`
|
||||||
- **Trained on:** generated_track + mountain_track from scratch, ~90k steps, lr=0.000725, switch=6,851
|
- **Trained on:** generated_track + mountain_track, ~90k steps, lr=0.000725, switch=6,851
|
||||||
- **Test on generated_track:** 3/3 episodes drove full 2000 steps, 13–16 second genuine laps
|
- **Performance:** generated_track 2000/2000, mini_monaco 2000/2000 (zero-shot)
|
||||||
- **Test on mini_monaco:** Full 2000 steps, 40-second genuine laps (zero-shot — never seen during training)
|
- **Problem:** Same hyperparameters repeated multiple times → all failed. This was a lucky random seed.
|
||||||
- **This is our best model**
|
|
||||||
|
|
||||||
### ✅ Wave 4 Trial 19 — generated_track (mostly)
|
|
||||||
- **Path:** `models/wave4-trial-0019/model.zip`
|
|
||||||
- **Trained on:** generated_track + mountain_track from scratch, ~74k steps, lr=0.000629, switch=8,211
|
|
||||||
- **Test on generated_track:** 2/3 episodes drove full 2000 steps, 14–17 second genuine laps. 1 crash.
|
|
||||||
- **mini_monaco score during training:** 231 (best "honest" result from Wave 4)
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Key Finding: Generated Track Lighting Variation
|
## What We Know (cumulative)
|
||||||
The generated_track changes lighting conditions (sun angle, shadows) on every
|
|
||||||
env.reset() due to procedural generation. This means during training, every
|
### Reward functions
|
||||||
episode showed a different visual appearance of the same track. The model was
|
- **v4** (base × efficiency × speed_bonus): works for generated_track; gives zero gradient on mountain hills
|
||||||
forced to learn track-geometry features (road edges, markings) rather than
|
- **v5** (speed × CTE-quality): works for mountain; circular driving exploit possible on flat track
|
||||||
lighting-specific patterns. This visual robustness is almost certainly why
|
- **v6** (v5 + efficiency gate ≥ 0.15): prevents circular exploit; may suppress early exploration
|
||||||
Trial 9 can zero-shot generalise to mini_monaco.
|
|
||||||
|
### Training approaches tried and their outcomes
|
||||||
|
| Approach | Result |
|
||||||
|
|---|---|
|
||||||
|
| Single-track PPO (Exp 9, 13) | ✅ Reliable. Best per-track performance. |
|
||||||
|
| Round-robin close-and-switch (Wave 4, Exp 10) | ❌ 80% failure rate. Disrupts PPO rollout buffer. |
|
||||||
|
| Parallel DummyVecEnv 90k steps (Exp 11b) | ⚠️ Infrastructure works; 90k too few steps (194 steps on all tracks). |
|
||||||
|
| Cross-track warm start both directions (Exp 15, 16) | ❌ Both failed. Single-track policies too specialised for naive transfer. |
|
||||||
|
|
||||||
|
### Mountain track physics (fixed 2026-04-27)
|
||||||
|
The mountain_track.unity scene assigned Slippery physics material (staticFriction=0.1)
|
||||||
|
to 4 track surface colliders. WheelPhys.cs scales wheel grip by surface staticFriction,
|
||||||
|
so the car had 1/5 normal grip on the hill. This caused visible wheelspin.
|
||||||
|
Fixed by assigning Road material (staticFriction=0.5) to those 4 colliders in
|
||||||
|
`sdsim/Assets/Scenes/mountain_track.unity`. The project uses a pre-built Windows
|
||||||
|
executable (DonkeySimWin/donkey_sim.exe), so this fix is deferred until the sim
|
||||||
|
is rebuilt from source in Unity Editor. Proceed with Exp 17 using the existing binary.
|
||||||
|
|
||||||
|
### Key parameter knowledge
|
||||||
|
- **lr:** 0.000725 (from Trial 9 and Exp 9 — consistent with good results)
|
||||||
|
- **throttle_min:** 0.2 (v5/v6 reward gives non-zero gradient on hills even at 0.2)
|
||||||
|
- **n_steer/n_throttle:** Relevant for discrete action space only (PPO uses continuous)
|
||||||
|
- **Per-env throttle_min in DummyVecEnv:** Feasible — each env wrapped independently
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Full Test Results — April 16
|
## Open Strategy (as of April 27)
|
||||||
|
|
||||||
| Test | Model | Track | Laps | Steps | Verdict |
|
The goal is reliable multi-track generalisation. The validated path forward:
|
||||||
|---|---|---|---|---|---|
|
|
||||||
| 1 | Phase 2 champion | generated_road | n/a (not a loop) | 2000/2000 | ✅ DRIVES |
|
|
||||||
| 2 | Wave 4 Trial 3 | generated_track | — | — | ❌ MODEL CORRUPTED |
|
|
||||||
| 3 | Wave 4 Trial 9 | generated_track | 6 laps × 3 eps | 2000/2000 | ✅ DRIVES |
|
|
||||||
| 4 | Wave 4 Trial 9 | mini_monaco | 2 laps per ep | 2000/2000 | ✅ DRIVES (zero-shot) |
|
|
||||||
| 5 | Wave 4 Trial 14 | mini_monaco | 1 lap ep2 only | 257/901/253 | ⚠️ INCONSISTENT |
|
|
||||||
| 6 | Wave 4 Trial 25 | mini_monaco | 0 | ~147/eps | ❌ CRASHES |
|
|
||||||
| + | Wave 4 Trial 19 | generated_track | 5-6 laps × 2 eps | crash/2000/2000 | ✅ MOSTLY |
|
|
||||||
| + | Wave 4 Trial 22 | generated_track | 0 | ~110/eps | ❌ SAME SPOT |
|
|
||||||
| + | Wave 4 Trial 2 | generated_track | 0 | ~76/eps | ❌ CRASHES |
|
|
||||||
| + | Trial 3 (recovered) | generated_track | 0 | ~104/eps | ❌ CRASHES |
|
|
||||||
|
|
||||||
---
|
1. **Exp 17:** Parallel DummyVecEnv with 400k–500k steps
|
||||||
|
- Two sim instances: generated_track:9091, mountain_track:9093
|
||||||
|
- v6 reward on both (efficiency gate + CTE patience terminator)
|
||||||
|
- throttle_min=0.2 both envs (or optionally 0.5 on mountain, 0.2 on generated)
|
||||||
|
- lr=0.000725, checkpoint every 20k, best_model tracked throughout
|
||||||
|
- Eval mini_monaco zero-shot at every checkpoint
|
||||||
|
3. **If Exp 17 plateaus:** Try curriculum (generated_track only for 150k, then add mountain)
|
||||||
|
4. **If still stuck:** Tune v6 efficiency gate threshold (check % steps gated in early training)
|
||||||
|
|
||||||
## What We Know Now
|
See `docs/TEST_HISTORY.md` for full Exp 17 design.
|
||||||
|
|
||||||
1. **Trial 9 is a genuine multi-track model.** It drives generated_track
|
|
||||||
consistently (3/3) with clean laps, AND generalises zero-shot to
|
|
||||||
mini_monaco (never seen in training). This is real progress.
|
|
||||||
|
|
||||||
2. **The "amazing" overnight model (Trial 3) is lost.** The model.zip has
|
|
||||||
a corrupted optimizer file. Policy weights were recovered but the model
|
|
||||||
crashes at ~104 steps — the "amazing" driving was at an intermediate
|
|
||||||
training checkpoint, not the final saved model.
|
|
||||||
|
|
||||||
3. **Most Wave 4 high scores were not exploits — they were real.**
|
|
||||||
Trials 5, 6, and 14 showed inconsistent results (crash some episodes,
|
|
||||||
complete lap on others). The model was genuinely learning but unreliably.
|
|
||||||
Only Trial 14 and 25's original very high scores (1573, 1543) appear
|
|
||||||
to have been exploits in the original training eval.
|
|
||||||
|
|
||||||
4. **Lighting variation on generated_track is a feature, not a bug.**
|
|
||||||
Procedural generation changes sun angle / shadows each episode, forcing
|
|
||||||
the model to learn geometry rather than appearance. This may be the key
|
|
||||||
to Trial 9's generalisation ability.
|
|
||||||
|
|
||||||
5. **Mountain_track training — unknown contribution.** We don't know if
|
|
||||||
mountain_track training helped or hurt. Trial 9 drives generated_track
|
|
||||||
and mini_monaco; whether it can drive mountain_track is untested.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Open Questions for Strategy Discussion
|
|
||||||
|
|
||||||
1. Can Trial 9 also drive mountain_track? (untested)
|
|
||||||
2. Can Trial 9 drive generated_road? (untested — zero-shot to Phase 2 training track)
|
|
||||||
3. Why does Trial 9 drive mini_monaco but other models with similar
|
|
||||||
mini_monaco scores (Trial 14: 193, Trial 22: 193) don't reliably?
|
|
||||||
4. Would more training steps from Trial 9's hyperparameters produce
|
|
||||||
an even better model?
|
|
||||||
5. Is mountain_track necessary, or could we get Trial 9's results
|
|
||||||
training on generated_track alone?
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -102,9 +85,9 @@ Trial 9 can zero-shot generalise to mini_monaco.
|
||||||
|
|
||||||
| Model | Path | Status |
|
| Model | Path | Status |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| Phase 2 champion | models/champion/model.zip | ✅ Good |
|
| exp13-gentrack-v4 | models/exp13-gentrack-v4/best_model.zip | ✅ Generated_track specialist |
|
||||||
| Wave 4 Trial 9 | models/wave4-trial-0009/model.zip | ✅ Best model |
|
| exp14-mountain-v5-finetune ft_036k | models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip | ✅ Mountain specialist (best overall mountain model) |
|
||||||
| Wave 4 Trial 19 | models/wave4-trial-0019/model.zip | ✅ Good |
|
| exp14-mountain-v5 | models/exp14-mountain-v5/best_model.zip | ✅ Mountain base (good, slightly worse than ft_036k) |
|
||||||
| Wave 4 Trial 3 | models/wave4-trial-0003/model.zip | ❌ Corrupted |
|
| Wave 4 Trial 9 | models/wave4-trial-0009/model.zip | ✅ Best generalising model; unreproducible |
|
||||||
| Wave 4 Trials 1,2,5-8,10-25 | models/wave4-trial-XXXX/ | Available, mostly crash on generated_track |
|
| Phase 2 champion | models/champion/model.zip | ✅ generated_road specialist only |
|
||||||
|
| Wave 4 other trials | models/wave4-trial-XXXX/ | Mostly crash on all tracks |
|
||||||
|
|
|
||||||
|
|
@ -508,3 +508,105 @@ For now:
|
||||||
- keep the single-track champions as separate specialists
|
- keep the single-track champions as separate specialists
|
||||||
- do **not** assume direct cross-track warm starts are beneficial
|
- do **not** assume direct cross-track warm starts are beneficial
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Mountain Track Friction Fix (2026-04-27)
|
||||||
|
|
||||||
|
### Root cause
|
||||||
|
|
||||||
|
`WheelPhys.cs` scales wheel grip by the static friction of whatever surface the
|
||||||
|
wheel is touching: `fFriction.stiffness = hit.collider.material.staticFriction * originalForwardStiffness`.
|
||||||
|
|
||||||
|
`mountain_track.unity` assigned the Slippery physics material (staticFriction=0.1)
|
||||||
|
to 4 track surface colliders from the `long_road` prefab. This gave the car 1/5
|
||||||
|
the normal grip on the hill, causing visible wheelspin even at full throttle.
|
||||||
|
|
||||||
|
The Slippery material is intentional on genuinely icy surfaces (thunderhill) but
|
||||||
|
was incorrect on mountain_track's asphalt hill.
|
||||||
|
|
||||||
|
### Fix applied
|
||||||
|
|
||||||
|
Replaced all 4 Slippery material assignments with Road material (staticFriction=0.5)
|
||||||
|
in `sdsim/Assets/Scenes/mountain_track.unity`.
|
||||||
|
|
||||||
|
| Material | staticFriction | GUID |
|
||||||
|
|---|---|---|
|
||||||
|
| Slippery (removed) | 0.1 | c0e12c099c364af4e9e311a43d0f12c4 |
|
||||||
|
| Road (applied) | 0.5 | 7884193b0ead347a38a13a67f294dfb5 |
|
||||||
|
|
||||||
|
### To activate
|
||||||
|
|
||||||
|
The training setup uses the pre-built Windows executable (`DonkeySimWin/donkey_sim.exe`),
|
||||||
|
not a locally-compiled build. The scene file edit in sdsandbox/ has no effect on the
|
||||||
|
running binary — it only matters if the sim is ever rebuilt from source in Unity Editor.
|
||||||
|
|
||||||
|
**This fix is deferred.** Proceed with Exp 17 using the existing executable.
|
||||||
|
If mountain hill training in Exp 17 specifically struggles (short episodes that plateau
|
||||||
|
and never improve), that is the signal to pursue a Unity Editor rebuild.
|
||||||
|
|
||||||
|
The scene file change is committed in sdsandbox/ and will apply automatically if the
|
||||||
|
sim is rebuilt for any other reason. No Python code changes needed.
|
||||||
|
|
||||||
|
### Expected effect
|
||||||
|
|
||||||
|
- Hill wheelspin should stop or greatly reduce
|
||||||
|
- Throttle_min=0.2 + v5 reward should be even more effective on the hill
|
||||||
|
- All future mountain experiments benefit; no code changes needed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Strategy Review and Exp 17 Plan (2026-04-27)
|
||||||
|
|
||||||
|
### Where the project stands
|
||||||
|
|
||||||
|
After 16 experiments and 4 autoresearch phases, the core problem is clear:
|
||||||
|
multi-track training is needed for generalisation, but the training method has
|
||||||
|
been unreliable. Here is the summary of what each approach found:
|
||||||
|
|
||||||
|
| Approach | Outcome |
|
||||||
|
|---|---|
|
||||||
|
| Round-robin close-and-switch (Wave 4, Exp 10) | 80% failure. PPO rollout buffer disrupted on env swap. Lucky seed (Trial 9) worked once but cannot be reproduced. |
|
||||||
|
| Parallel DummyVecEnv 90k steps (Exp 11b) | Infrastructure valid, no catastrophic forgetting, but 90k steps / 2 tracks = ~45k effective per track. Not enough. |
|
||||||
|
| Cross-track warm starts (Exp 15, 16) | Both directions failed. Single-track specialists do not transfer cleanly. |
|
||||||
|
| Single-track PPO (Exp 9, 13, 14) | Reliable but no generalisation. |
|
||||||
|
|
||||||
|
The conclusion: **parallel DummyVecEnv is the right architecture; the only known
|
||||||
|
failure mode is training budget**. Exp 11b was mechanically sound but starved of steps.
|
||||||
|
|
||||||
|
### Exp 17 — Parallel DummyVecEnv, 400k–500k steps
|
||||||
|
|
||||||
|
**This is the primary next experiment.**
|
||||||
|
|
||||||
|
| Parameter | Value | Reason |
|
||||||
|
|---|---|---|
|
||||||
|
| Architecture | DummyVecEnv([generated_track:9091, mountain_track:9093]) | Validated in Exp 11b; no PPO disruption |
|
||||||
|
| Total timesteps | 400,000–500,000 | ~200k effective per track; Exp 11b proved 90k insufficient |
|
||||||
|
| Reward | v6 on both envs (efficiency gate + CTE patience terminator) | Blocks circular exploit on generated_track; gate threshold may be tuned |
|
||||||
|
| throttle_min | 0.2 both envs (or 0.5 mountain, 0.2 generated — see ADR-020) | v5/v6 gradient non-zero on hills at 0.2 |
|
||||||
|
| learning_rate | 0.000725 | From Trial 9 and Exp 9 — consistent with best results |
|
||||||
|
| Checkpoint | every 20,000 steps + best_model.zip tracked throughout | ADR-017: best model ≠ final model |
|
||||||
|
| Eval | mini_monaco zero-shot at every checkpoint | Detect the peak before policy drifts |
|
||||||
|
| Warm start | None — train from random weights | ADR-024: cross-track warm starts failed |
|
||||||
|
|
||||||
|
**Setup checklist before running:**
|
||||||
|
1. Two sim instances running: one on port 9091, one on port 9093
|
||||||
|
2. Both on the same track as configured (generated_track and mountain_track)
|
||||||
|
3. Rebuild simulator with mountain friction fix active
|
||||||
|
4. Verify throughput: run 2-minute timing benchmark, set step cap accordingly (ADR-014)
|
||||||
|
|
||||||
|
**Success criterion:** mini_monaco zero-shot score > 500 (at least 25% of a full
|
||||||
|
2000-step episode) reliably across 3 evaluation sets, reproducible across 2+ runs.
|
||||||
|
|
||||||
|
### Fallback: Curriculum training (if Exp 17 plateaus below 200)
|
||||||
|
|
||||||
|
If Exp 17 cannot get past ~200 steps on mini_monaco:
|
||||||
|
- Phase A: generated_track only, 150k steps (establish road-following)
|
||||||
|
- Phase B: add mountain_track to DummyVecEnv, continue 250k more steps
|
||||||
|
- Rationale: gives the policy a foundation before the harder mountain physics
|
||||||
|
|
||||||
|
### Fallback: v6 efficiency gate tuning (if gate is too aggressive)
|
||||||
|
|
||||||
|
Log what fraction of steps are gated (reward zeroed) in the first 100k steps.
|
||||||
|
If >40%, lower the gate threshold from 0.15 to 0.10 for the first 150k steps,
|
||||||
|
then raise it back to 0.15. Prevents the gate from suppressing early exploration.
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue