From b504b89b2a8eb9fe6ade9e3c472bbac92d3fd56d Mon Sep 17 00:00:00 2001 From: Paul Huliganga Date: Tue, 28 Apr 2026 02:42:20 -0400 Subject: [PATCH] feat: add exp17 parallel DummyVecEnv 450k training + strategy docs - exp17_parallel_450k.py: parallel two-track training (generated_track:9091, mountain_track:9093), 450k steps, v6 reward, HOST=localhost - DECISIONS.md: ADR-025 (parallel strategy) and ADR-026 (mountain friction fix) - docs/STATE.md: updated to April 2026 state with current champions and strategy - docs/TEST_HISTORY.md: mountain friction fix notes + Exp 17 full design - outerloop-results: exp14 finetune logs and robust mountain eval results Co-Authored-By: Claude Sonnet 4.6 --- DECISIONS.md | 76 +++++++ agent/experiments/README.md | 1 + agent/experiments/exp17_parallel_450k.py | 199 ++++++++++++++++++ .../outerloop-results/exp14_finetune_log.txt | 61 ++++++ .../exp14_finetune_results.jsonl | 20 ++ .../robust_eval_mountain.jsonl | 13 ++ docs/STATE.md | 149 ++++++------- docs/TEST_HISTORY.md | 102 +++++++++ 8 files changed, 538 insertions(+), 83 deletions(-) create mode 100644 agent/experiments/exp17_parallel_450k.py create mode 100644 agent/outerloop-results/exp14_finetune_log.txt create mode 100644 agent/outerloop-results/exp14_finetune_results.jsonl create mode 100644 agent/outerloop-results/robust_eval_mountain.jsonl diff --git a/DECISIONS.md b/DECISIONS.md index 41c125a..2dafce4 100644 --- a/DECISIONS.md +++ b/DECISIONS.md @@ -576,3 +576,79 @@ experts, not as obviously reusable initializations for the other track. - If transfer is revisited, it likely needs a more careful method than naive direct warm-starting on the other track - Mountain physics issues should be addressed before revisiting transfer conclusions + +--- + +## ADR-025: Parallel DummyVecEnv with 400k+ Steps is the Primary Multi-Track Strategy + +**Date:** 2026-04-27 +**Status:** Active + +**Context:** After Wave 4 (25 trials, 80% failure rate), Exp 10 (catastrophic forgetting), +Exp 11b (infrastructure works but 90k steps insufficient), and Exp 15/16 (cross-track +warm starts failed both directions), the only multi-track approach that did not have a +fundamental flaw was parallel DummyVecEnv — Exp 11b failed only because the training +budget was halved relative to what single-track training needs. + +**Decision:** The primary next strategy is: +1. Two sim instances (one per training track, separate ports) +2. SB3 `DummyVecEnv([env_generated, env_mountain])` — PPO sees both tracks in every batch +3. 400,000–500,000 total timesteps (~200k effective per track) +4. v6 reward (efficiency gate + CTE patience terminator) on both envs +5. No warm start — train from random weights +6. Checkpoint every 20k steps, track mini_monaco zero-shot score throughout + +**Why parallel DummyVecEnv:** +- PPO is an on-policy algorithm that depends on a stable rollout buffer. + Swapping environments mid-training disrupts value estimates and causes catastrophic forgetting. + DummyVecEnv feeds both tracks into every PPO rollout batch — no forgetting, no disruption. +- This is how SB3 was designed to be used with multiple environments. + +**Why 400k+ steps:** +- Single-track training converges in ~60–90k steps. +- Two parallel tracks need at least 2× the budget because each track gets half the gradient. + Interference between the two tasks adds further overhead. +- Exp 11b at 90k steps (effectively 45k per track) produced only 194-step drives on both tracks. + 400k should provide adequate budget for both. + +**Rejected alternatives:** +- Round-robin close-and-switch: disrupts PPO, 80% failure rate across 25 trials +- Cross-track warm starts: failed both directions (ADR-024) +- More autoresearch trials on round-robin: the method is fundamentally unreliable + +**Fallback if 400k parallel fails:** Curriculum — train generated_track alone for 150k steps, +then add mountain to the DummyVecEnv pool for 250k more steps. + +--- + +## ADR-026: Mountain Track Friction Fix — Use Road Material on Hill Colliders + +**Date:** 2026-04-27 +**Status:** Accepted — fix applied + +**Context:** `WheelPhys.cs` multiplies wheel grip stiffness by the static friction of the +surface the wheel is hitting. The mountain_track scene assigned Slippery physics material +(staticFriction=0.1) to 4 track surface colliders from the long_road prefab, giving the +car 1/5 the normal traction on the hill. This caused visible wheelspin at full throttle and +made hill climbing genuinely difficult for learned policies. + +**Decision:** Replace the 4 Slippery material assignments in `mountain_track.unity` with the +Road material (staticFriction=0.5). This is a targeted scene-level override; the Slippery +material asset itself is unchanged and remains available for intentionally slippery surfaces. + +**Fix location:** `sdsim/Assets/Scenes/mountain_track.unity` — all 4 PrefabModification +entries that set `propertyPath: m_Material` on long_road colliders now reference Road +(GUID 7884193b0ead347a38a13a67f294dfb5) instead of Slippery (GUID c0e12c099c364af4e9e311a43d0f12c4). + +**To activate:** Rebuild the Unity simulator binary after pulling the updated scene file. +No Python code changes needed. + +**What this does NOT change:** +- `Slippery.physicMaterial` asset — unchanged (still used by thunderhill, circuit_launch) +- `Donkey_new_phys.prefab` strut colliders — also reference Slippery, but these are car body + parts that the wheels don't touch. WheelPhys.cs only reads friction from ground hits. +- mini_monaco.unity — also has one Slippery reference; left intentional for now + +**Expected effect:** Hill wheelspin should stop. The policy should find it easier to climb +the hill at throttle_min=0.2, and Exp 17 multi-track results should be more interpretable +since we are no longer fighting a physics artifact. diff --git a/agent/experiments/README.md b/agent/experiments/README.md index cc9d8e0..b79a003 100644 --- a/agent/experiments/README.md +++ b/agent/experiments/README.md @@ -5,6 +5,7 @@ Each corresponds to an entry in docs/TEST_HISTORY.md. | Script | Experiment | Key change | |---|---|---| +| exp17_parallel_450k.py | Exp 17 | Parallel DummyVecEnv, 450k steps, v6 reward, HOST=localhost | | mountain_v5.py | Exp 5 | v5 reward + throttle_min=0.5, direct model.learn() | | mountain_continue.py | Exp 4 | Continued Exp3 training | | mountain_high_throttle.py | Exp 3 | throttle_min=0.5, old v4 reward | diff --git a/agent/experiments/exp17_parallel_450k.py b/agent/experiments/exp17_parallel_450k.py new file mode 100644 index 0000000..b58b568 --- /dev/null +++ b/agent/experiments/exp17_parallel_450k.py @@ -0,0 +1,199 @@ +""" +Exp 17: Parallel DummyVecEnv — generated_track + mountain_track, 450k steps. + +Strategy: Exp 11b proved the parallel DummyVecEnv infrastructure is stable. +The only failure mode was insufficient training budget (~45k effective steps +per track). This experiment triples the budget to ~225k per track. + +Changes from Exp 11b: + - HOST: 10.0.0.55 → localhost (WSL/Windows share ports) + - TOTAL_STEPS: 90k → 450k + - CHECKPOINT_EVERY: 6k → 20k + - SAVE_DIR: exp17-parallel-450k + +Everything else identical to Exp 11b (same reward, wrappers, lr, throttle_min). + +Setup — TWO sim instances required: + Sim 1: launch donkey_sim.exe, select generated_track, port 9091 (default) + Sim 2: launch a second donkey_sim.exe with --port 9093, select mountain_track + Command: donkey_sim.exe --port 9093 + + Both sims must be running and on the correct tracks before starting this script. + +Evaluation: + - Mid-training: both training tracks evaluated at each 20k checkpoint + - End-of-training: all 4 tracks evaluated sequentially (port 9091) +""" +import sys, os, time +sys.path.insert(0, '/home/paulh/projects/donkeycar-rl-autoresearch/agent') + +from multitrack_runner import log, StuckTerminationWrapper +from donkeycar_sb3_runner import ThrottleClampWrapper +from reward_wrapper import SpeedRewardWrapper +from stable_baselines3 import PPO +from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage +import gymnasium as gym +import numpy as np + +HOST = 'localhost' +THROTTLE_MIN = 0.2 +LR = 0.000725 +TOTAL_STEPS = 450_000 +CHECKPOINT_EVERY = 20_000 +SAVE_DIR = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp17-parallel-450k' +os.makedirs(SAVE_DIR, exist_ok=True) + + +def make_env(track_id, port): + def _init(): + raw = gym.make(track_id, conf={'host': HOST, 'port': port}) + env = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN) + env = StuckTerminationWrapper(env, stuck_steps=40, min_displacement=0.5) + env = SpeedRewardWrapper(env) + return env + return _init + + +log('=' * 60) +log('Exp 17: Parallel DummyVecEnv — 450k steps') +log(f' Sim 1: {HOST}:9091 → generated_track') +log(f' Sim 2: {HOST}:9093 → mountain_track') +log(f' throttle_min={THROTTLE_MIN}, lr={LR}, total={TOTAL_STEPS:,}') +log(f' Reward: v6 (speed × CTE_quality, efficiency gate >= 0.15)') +log(f' Stuck termination: 40 steps (~2.5s)') +log(f' Checkpoints: every {CHECKPOINT_EVERY:,} steps') +log('=' * 60) + +log('Creating DummyVecEnv with two tracks...') +env = DummyVecEnv([ + make_env('donkey-generated-track-v0', 9091), + make_env('donkey-mountain-track-v0', 9093), +]) +env = VecTransposeImage(env) +log(f' VecEnv num_envs={env.num_envs}, obs={env.observation_space.shape}') + +model = PPO('CnnPolicy', env, learning_rate=LR, verbose=1, device='cpu') +log('PPO created. Starting training...') + +best_reward = float('-inf') +steps_done = 0 + +while steps_done < TOTAL_STEPS: + seg_steps = min(CHECKPOINT_EVERY, TOTAL_STEPS - steps_done) + model.learn(total_timesteps=seg_steps, reset_num_timesteps=False) + steps_done += seg_steps + + ckpt = os.path.join(SAVE_DIR, f'checkpoint_{steps_done:07d}') + model.save(ckpt) + model.save(os.path.join(SAVE_DIR, 'model')) + log(f'[{steps_done:,}/{TOTAL_STEPS:,}] Checkpoint saved: {ckpt}.zip') + + # Eval on both training tracks using the existing DummyVecEnv connections + try: + obs = env.reset() + ep_rewards = np.zeros(env.num_envs) + ep_steps = np.zeros(env.num_envs) + done_mask = np.zeros(env.num_envs, dtype=bool) + for _ in range(2000): + action, _ = model.predict(obs, deterministic=True) + obs, rewards, dones, infos = env.step(action) + for i in range(env.num_envs): + if not done_mask[i]: + ep_rewards[i] += rewards[i] + ep_steps[i] += 1 + if dones[i]: + done_mask[i] = True + if done_mask.all(): + break + + status0 = '✅' if ep_steps[0] >= 2000 else f'❌@{int(ep_steps[0])}' + status1 = '✅' if ep_steps[1] >= 2000 else f'❌@{int(ep_steps[1])}' + log(f' Eval: gen_track={ep_rewards[0]:.1f}r/{int(ep_steps[0])}s {status0} ' + f'mountain={ep_rewards[1]:.1f}r/{int(ep_steps[1])}s {status1}') + + total_reward = ep_rewards.sum() + if total_reward > best_reward: + best_reward = total_reward + model.save(os.path.join(SAVE_DIR, 'best_model')) + log(f' ⭐ NEW BEST: {best_reward:.1f} combined reward') + except Exception as e: + log(f' Eval error: {e}') + import traceback; traceback.print_exc() + +model.save(os.path.join(SAVE_DIR, 'model')) +log(f'\nTraining complete. Best combined reward: {best_reward:.1f}') + +env.close() +time.sleep(5) + +# --- Final eval on all 4 tracks (sequential, port 9091) --- +log('\n' + '=' * 60) +log('FINAL EVALUATION: best_model on 4 tracks (3 sets each)') +log('=' * 60) + +EVAL_TRACKS = [ + ('donkey-generated-track-v0', 'generated_track'), + ('donkey-mountain-track-v0', 'mountain_track'), + ('donkey-minimonaco-track-v0', 'mini_monaco'), + ('donkey-generated-roads-v0', 'generated_road'), +] +EVAL_PORT = 9091 +EVAL_SETS = 3 +EVAL_MAX_STEPS = 2000 + +best_model_path = os.path.join(SAVE_DIR, 'best_model.zip') +results_by_track = {} + +for track_id, track_name in EVAL_TRACKS: + log(f'\n--- {track_name} ---') + steps_list = [] + + for s in range(1, EVAL_SETS + 1): + try: + raw = gym.make(track_id, conf={'host': HOST, 'port': EVAL_PORT}) + inner = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN) + inner = StuckTerminationWrapper(inner, stuck_steps=40, min_displacement=0.5) + inner = SpeedRewardWrapper(inner) + eval_env = VecTransposeImage(DummyVecEnv([lambda e=inner: e])) + + eval_model = PPO.load(best_model_path, env=eval_env, device='cpu') + + obs = eval_env.reset() + total_r, total_s, done = 0.0, 0, False + while not done and total_s < EVAL_MAX_STEPS: + action, _ = eval_model.predict(obs, deterministic=True) + result = eval_env.step(action) + if len(result) == 4: + obs, r, d, info = result + done = bool(d[0]) + else: + obs, r, t, tr, info = result + done = bool(t[0] or tr[0]) + total_r += float(r[0]) + total_s += 1 + + status = '✅' if total_s >= EVAL_MAX_STEPS else f'❌@{total_s}' + log(f' Set {s}: {total_r:.1f}r / {total_s}s {status}') + steps_list.append(total_s) + + eval_env.close() + time.sleep(3) + except Exception as e: + log(f' Set {s}: ERROR — {e}') + steps_list.append(0) + time.sleep(3) + + mean_steps = np.mean(steps_list) if steps_list else 0 + results_by_track[track_name] = steps_list + log(f' Mean: {mean_steps:.0f} steps') + +log('\n' + '=' * 60) +log('SUMMARY') +log('=' * 60) +for track_name, steps_list in results_by_track.items(): + steps_str = '/'.join(str(s) for s in steps_list) + mean = np.mean(steps_list) + verdict = '✅' if mean >= 1500 else '⚠️' if mean >= 500 else '❌' + log(f' {verdict} {track_name:20s}: {steps_str} mean={mean:.0f}') + +log(f'\n=== Exp 17 COMPLETE ===') diff --git a/agent/outerloop-results/exp14_finetune_log.txt b/agent/outerloop-results/exp14_finetune_log.txt new file mode 100644 index 0000000..5f04654 --- /dev/null +++ b/agent/outerloop-results/exp14_finetune_log.txt @@ -0,0 +1,61 @@ +2026-04-20T00:08:21.090963 Loading warm-start model from models/exp14-mountain-v5/best_model.zip +2026-04-20T00:09:16.674927 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using throttle_min=0.2 env +2026-04-20T00:09:19.055092 Switching model to env with throttle_min=0.4 +2026-04-20T00:10:27.385278 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env +2026-04-20T00:11:08.699368 ERROR during fine-tune: 'NoneType' object is not callable +2026-04-20T00:11:08.901669 Fine-tune complete. steps_done=0 +2026-04-20T00:14:43.472139 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env +2026-04-20T00:17:44.473941 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env +2026-04-20T00:21:10.924456 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env +2026-04-20T00:25:31.932947 Loading warm-start model from models/exp14-mountain-v5/best_model.zip without creating a temp env +2026-04-20T00:28:59.848890 [6000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0006000.zip +2026-04-20T00:28:59.848966 ERROR during fine-tune: name 'make_env' is not defined +2026-04-20T00:29:00.509181 Fine-tune complete. steps_done=6000 +2026-04-20T00:31:09.594830 Loading warm-start model from models/exp14-mountain-v5/best_model.zip without creating a temp env +2026-04-20T00:34:50.056288 [6000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0006000.zip +2026-04-20T00:35:04.415348 ERROR during fine-tune: name 'json' is not defined +2026-04-20T00:35:04.546033 Fine-tune complete. steps_done=6000 +2026-04-20T00:37:47.831240 Loading warm-start model from models/exp14-mountain-v5/best_model.zip without creating a temp env +2026-04-20T00:41:21.675776 [6000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0006000.zip +2026-04-20T00:41:43.554021 Eval @ 6000: mean_steps=384.7 mean_lap=21.59375 +2026-04-20T00:41:43.694831 ⭐ NEW BEST (mean lap 21.59s) saved +2026-04-20T00:45:26.980198 [12000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0012000.zip +2026-04-20T00:45:42.741989 Eval @ 12000: mean_steps=187.7 mean_lap=None +2026-04-20T00:49:24.586893 [18000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0018000.zip +2026-04-20T00:49:42.795830 Eval @ 18000: mean_steps=287.3 mean_lap=None +2026-04-20T00:53:15.614884 [24000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0024000.zip +2026-04-20T00:53:37.070339 Eval @ 24000: mean_steps=374.7 mean_lap=21.765625 +2026-04-20T00:57:09.352148 [30000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0030000.zip +2026-04-20T00:57:36.938090 Eval @ 30000: mean_steps=537.7 mean_lap=22.046875 +2026-04-20T00:57:36.938120 Switching env to throttle_min=0.2 +2026-04-20T01:00:55.914640 [36000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0036000.zip +2026-04-20T01:01:56.665949 Eval @ 36000: mean_steps=1451.7 mean_lap=28.434895833333332 +2026-04-20T01:05:10.807288 [42000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0042000.zip +2026-04-20T01:05:57.449632 Eval @ 42000: mean_steps=1067.7 mean_lap=27.44140625 +2026-04-20T01:08:54.843851 [48000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0048000.zip +2026-04-20T01:10:00.878424 Eval @ 48000: mean_steps=1626.7 mean_lap=29.776785714285715 +2026-04-20T01:13:16.089861 [54000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0054000.zip +2026-04-20T01:14:18.435622 Eval @ 54000: mean_steps=1528.3 mean_lap=30.234375 +2026-04-20T01:17:25.682859 [60000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0060000.zip +2026-04-20T01:18:28.243356 Eval @ 60000: mean_steps=1533.0 mean_lap=34.33125 +2026-04-20T01:21:38.247436 [66000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0066000.zip +2026-04-20T01:21:54.995379 Eval @ 66000: mean_steps=163.7 mean_lap=None +2026-04-20T01:25:14.752223 [72000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0072000.zip +2026-04-20T01:26:11.926001 Eval @ 72000: mean_steps=1389.7 mean_lap=43.21484375 +2026-04-20T01:29:24.138321 [78000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0078000.zip +2026-04-20T01:29:59.928582 Eval @ 78000: mean_steps=757.0 mean_lap=43.453125 +2026-04-20T01:33:15.187091 [84000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0084000.zip +2026-04-20T01:33:49.188449 Eval @ 84000: mean_steps=704.7 mean_lap=41.046875 +2026-04-20T01:36:57.554346 [90000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0090000.zip +2026-04-20T01:38:12.054640 Eval @ 90000: mean_steps=1819.0 mean_lap=None +2026-04-20T01:41:29.620560 [96000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0096000.zip +2026-04-20T01:42:07.583154 Eval @ 96000: mean_steps=813.0 mean_lap=None +2026-04-20T01:45:23.503967 [102000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0102000.zip +2026-04-20T01:45:59.052782 Eval @ 102000: mean_steps=747.3 mean_lap=None +2026-04-20T01:49:02.510514 [108000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0108000.zip +2026-04-20T01:49:27.462705 Eval @ 108000: mean_steps=466.0 mean_lap=None +2026-04-20T01:52:40.338223 [114000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0114000.zip +2026-04-20T01:53:31.593848 Eval @ 114000: mean_steps=1169.0 mean_lap=None +2026-04-20T01:56:39.035861 [120000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0120000.zip +2026-04-20T01:57:28.658996 Eval @ 120000: mean_steps=1125.0 mean_lap=None +2026-04-20T01:57:28.795051 Fine-tune complete. steps_done=120000 diff --git a/agent/outerloop-results/exp14_finetune_results.jsonl b/agent/outerloop-results/exp14_finetune_results.jsonl new file mode 100644 index 0000000..f4777fc --- /dev/null +++ b/agent/outerloop-results/exp14_finetune_results.jsonl @@ -0,0 +1,20 @@ +{"steps_done": 6000, "throttle_min": 0.4, "mean_steps": 384.6666666666667, "mean_lap_time": 21.59375, "per_set": [{"steps": 205, "laps": 0, "lap_times": []}, {"steps": 177, "laps": 0, "lap_times": []}, {"steps": 772, "laps": 1, "lap_times": [21.59375]}]} +{"steps_done": 12000, "throttle_min": 0.4, "mean_steps": 187.66666666666666, "mean_lap_time": null, "per_set": [{"steps": 145, "laps": 0, "lap_times": []}, {"steps": 345, "laps": 0, "lap_times": []}, {"steps": 73, "laps": 0, "lap_times": []}]} +{"steps_done": 18000, "throttle_min": 0.4, "mean_steps": 287.3333333333333, "mean_lap_time": null, "per_set": [{"steps": 233, "laps": 0, "lap_times": []}, {"steps": 244, "laps": 0, "lap_times": []}, {"steps": 385, "laps": 0, "lap_times": []}]} +{"steps_done": 24000, "throttle_min": 0.4, "mean_steps": 374.6666666666667, "mean_lap_time": 21.765625, "per_set": [{"steps": 178, "laps": 0, "lap_times": []}, {"steps": 359, "laps": 0, "lap_times": []}, {"steps": 587, "laps": 1, "lap_times": [21.765625]}]} +{"steps_done": 30000, "throttle_min": 0.4, "mean_steps": 537.6666666666666, "mean_lap_time": 22.046875, "per_set": [{"steps": 854, "laps": 1, "lap_times": [22.046875]}, {"steps": 365, "laps": 0, "lap_times": []}, {"steps": 394, "laps": 0, "lap_times": []}]} +{"steps_done": 36000, "throttle_min": 0.2, "mean_steps": 1451.6666666666667, "mean_lap_time": 28.434895833333332, "per_set": [{"steps": 1540, "laps": 2, "lap_times": [29.34375, 26.84375]}, {"steps": 2000, "laps": 3, "lap_times": [29.4375, 28.4375, 27.015625]}, {"steps": 815, "laps": 1, "lap_times": [29.53125]}]} +{"steps_done": 42000, "throttle_min": 0.2, "mean_steps": 1067.6666666666667, "mean_lap_time": 27.44140625, "per_set": [{"steps": 467, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 3, "lap_times": [27.046875, 27.703125, 27.125]}, {"steps": 736, "laps": 1, "lap_times": [27.890625]}]} +{"steps_done": 48000, "throttle_min": 0.2, "mean_steps": 1626.6666666666667, "mean_lap_time": 29.776785714285715, "per_set": [{"steps": 2000, "laps": 3, "lap_times": [30.796875, 29.828125, 28.734375]}, {"steps": 880, "laps": 1, "lap_times": [30.65625]}, {"steps": 2000, "laps": 3, "lap_times": [29.703125, 29.203125, 29.515625]}]} +{"steps_done": 54000, "throttle_min": 0.2, "mean_steps": 1528.3333333333333, "mean_lap_time": 30.234375, "per_set": [{"steps": 585, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 3, "lap_times": [32.734375, 29.8125, 30.8125]}, {"steps": 2000, "laps": 3, "lap_times": [31.171875, 29.71875, 27.15625]}]} +{"steps_done": 60000, "throttle_min": 0.2, "mean_steps": 1533.0, "mean_lap_time": 34.33125, "per_set": [{"steps": 2000, "laps": 2, "lap_times": [39.140625, 33.140625]}, {"steps": 599, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 3, "lap_times": [34.21875, 31.953125, 33.203125]}]} +{"steps_done": 66000, "throttle_min": 0.2, "mean_steps": 163.66666666666666, "mean_lap_time": null, "per_set": [{"steps": 154, "laps": 0, "lap_times": []}, {"steps": 146, "laps": 0, "lap_times": []}, {"steps": 191, "laps": 0, "lap_times": []}]} +{"steps_done": 72000, "throttle_min": 0.2, "mean_steps": 1389.6666666666667, "mean_lap_time": 43.21484375, "per_set": [{"steps": 2000, "laps": 2, "lap_times": [50.140625, 35.6875]}, {"steps": 169, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 2, "lap_times": [39.890625, 47.140625]}]} +{"steps_done": 78000, "throttle_min": 0.2, "mean_steps": 757.0, "mean_lap_time": 43.453125, "per_set": [{"steps": 174, "laps": 0, "lap_times": []}, {"steps": 1074, "laps": 1, "lap_times": [46.03125]}, {"steps": 1023, "laps": 1, "lap_times": [40.875]}]} +{"steps_done": 84000, "throttle_min": 0.2, "mean_steps": 704.6666666666666, "mean_lap_time": 41.046875, "per_set": [{"steps": 953, "laps": 1, "lap_times": [40.21875]}, {"steps": 181, "laps": 0, "lap_times": []}, {"steps": 980, "laps": 1, "lap_times": [41.875]}]} +{"steps_done": 90000, "throttle_min": 0.2, "mean_steps": 1819.0, "mean_lap_time": null, "per_set": [{"steps": 2000, "laps": 0, "lap_times": []}, {"steps": 1963, "laps": 0, "lap_times": []}, {"steps": 1494, "laps": 0, "lap_times": []}]} +{"steps_done": 96000, "throttle_min": 0.2, "mean_steps": 813.0, "mean_lap_time": null, "per_set": [{"steps": 1671, "laps": 0, "lap_times": []}, {"steps": 385, "laps": 0, "lap_times": []}, {"steps": 383, "laps": 0, "lap_times": []}]} +{"steps_done": 102000, "throttle_min": 0.2, "mean_steps": 747.3333333333334, "mean_lap_time": null, "per_set": [{"steps": 715, "laps": 0, "lap_times": []}, {"steps": 932, "laps": 0, "lap_times": []}, {"steps": 595, "laps": 0, "lap_times": []}]} +{"steps_done": 108000, "throttle_min": 0.2, "mean_steps": 466.0, "mean_lap_time": null, "per_set": [{"steps": 468, "laps": 0, "lap_times": []}, {"steps": 476, "laps": 0, "lap_times": []}, {"steps": 454, "laps": 0, "lap_times": []}]} +{"steps_done": 114000, "throttle_min": 0.2, "mean_steps": 1169.0, "mean_lap_time": null, "per_set": [{"steps": 1318, "laps": 0, "lap_times": []}, {"steps": 1278, "laps": 0, "lap_times": []}, {"steps": 911, "laps": 0, "lap_times": []}]} +{"steps_done": 120000, "throttle_min": 0.2, "mean_steps": 1125.0, "mean_lap_time": null, "per_set": [{"steps": 941, "laps": 0, "lap_times": []}, {"steps": 1492, "laps": 0, "lap_times": []}, {"steps": 942, "laps": 0, "lap_times": []}]} diff --git a/agent/outerloop-results/robust_eval_mountain.jsonl b/agent/outerloop-results/robust_eval_mountain.jsonl new file mode 100644 index 0000000..705c4a5 --- /dev/null +++ b/agent/outerloop-results/robust_eval_mountain.jsonl @@ -0,0 +1,13 @@ +{"set": 1, "episode": 1, "steps": 195, "reward": 313.8098858782323, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:55:01.852800"} +{"set": 1, "episode": 2, "steps": 907, "reward": 821.3252189619088, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:55:15.580688"} +{"set": 1, "episode": 3, "steps": 187, "reward": 312.3699834933941, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:55:20.305057"} +{"set_summary": {"set": 1, "mean_steps": 429.6666666666667, "mean_reward": 482.50169611117843}} +{"set": 2, "episode": 1, "steps": 1684, "reward": 2886.7210297683996, "laps": 2, "lap_times": [30.796875, 27.3125], "timestamp": "2026-04-19T23:55:43.831212"} +{"set": 2, "episode": 2, "steps": 1791, "reward": 2724.1041878786637, "laps": 2, "lap_times": [29.234375, 31.578125], "timestamp": "2026-04-19T23:56:08.736059"} +{"set": 2, "episode": 3, "steps": 2000, "reward": 3338.140802157104, "laps": 3, "lap_times": [29.828125, 27.828125, 29.171875], "timestamp": "2026-04-19T23:56:34.963968"} +{"set_summary": {"set": 2, "mean_steps": 1825.0, "mean_reward": 2982.9886732680557}} +{"set": 3, "episode": 1, "steps": 189, "reward": 304.40264326371107, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:56:39.723007"} +{"set": 3, "episode": 2, "steps": 2000, "reward": 3396.2255747133167, "laps": 3, "lap_times": [29.875, 28.75, 27.765625], "timestamp": "2026-04-19T23:57:05.989723"} +{"set": 3, "episode": 3, "steps": 773, "reward": 1300.720640436186, "laps": 1, "lap_times": [31.265625], "timestamp": "2026-04-19T23:57:18.198014"} +{"set_summary": {"set": 3, "mean_steps": 987.3333333333334, "mean_reward": 1667.116286137738}} +{"overall": {"mean_steps_across_sets": 1080.6666666666667, "mean_reward_across_sets": 1710.8688851723239}} diff --git a/docs/STATE.md b/docs/STATE.md index 8b52739..c9bd478 100644 --- a/docs/STATE.md +++ b/docs/STATE.md @@ -1,100 +1,83 @@ -# Project State — April 16, 2026 (post-testing) +# Project State — April 27, 2026 ## The Goal + Train a DonkeyCar model that generalises to any road-surface track (outdoor, asphalt, lane markings) — demonstrated by driving a never-seen track without crashing. --- -## Confirmed Working Models (tested today, observed by user) +## Current Champion Models -### ✅ Phase 2 Champion — generated_road -- **Path:** `models/champion/model.zip` -- **Trained on:** generated_road only, ~13k steps, lr=0.000225 -- **Test result:** Drove full 2000 steps, 2013 reward. User: "driving very well, stayed in right-hand lane, very very good" -- **Other tracks:** Confirmed fails on generated_track (old multitrack_eval) +### ✅ exp13-gentrack-v4 — generated_track specialist +- **Path:** `models/exp13-gentrack-v4/best_model.zip` +- **Trained on:** generated_track only, ~30k steps (stopped early), lr=0.000725, throttle_min=0.2 +- **Reward:** v4 (base × efficiency × speed_bonus) +- **Performance:** Drives generated_track reliably, clean laps +- **Zero-shot:** Fails on mountain_track (expected — single-track specialist) -### ✅ Wave 4 Trial 9 — generated_track AND mini_monaco +### ✅ exp14-mountain-v5-finetune ft_036k — mountain specialist +- **Path:** `models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip` +- **Trained on:** mountain_track, fine-tuned from exp14 base, checkpoint at 36k steps +- **Reward:** v5 (speed × CTE-quality), throttle_floor=0.2 (switched from 0.4 at 30k) +- **Performance:** 9/9 successful episodes, 25 total laps, mean lap 27.93s, best lap 26.16s +- **Zero-shot:** Fails on generated_track (expected — single-track specialist) + +### ⭐ Wave 4 Trial 9 — best generalising model (but not reproducible) - **Path:** `models/wave4-trial-0009/model.zip` -- **Trained on:** generated_track + mountain_track from scratch, ~90k steps, lr=0.000725, switch=6,851 -- **Test on generated_track:** 3/3 episodes drove full 2000 steps, 13–16 second genuine laps -- **Test on mini_monaco:** Full 2000 steps, 40-second genuine laps (zero-shot — never seen during training) -- **This is our best model** - -### ✅ Wave 4 Trial 19 — generated_track (mostly) -- **Path:** `models/wave4-trial-0019/model.zip` -- **Trained on:** generated_track + mountain_track from scratch, ~74k steps, lr=0.000629, switch=8,211 -- **Test on generated_track:** 2/3 episodes drove full 2000 steps, 14–17 second genuine laps. 1 crash. -- **mini_monaco score during training:** 231 (best "honest" result from Wave 4) +- **Trained on:** generated_track + mountain_track, ~90k steps, lr=0.000725, switch=6,851 +- **Performance:** generated_track 2000/2000, mini_monaco 2000/2000 (zero-shot) +- **Problem:** Same hyperparameters repeated multiple times → all failed. This was a lucky random seed. --- -## Key Finding: Generated Track Lighting Variation -The generated_track changes lighting conditions (sun angle, shadows) on every -env.reset() due to procedural generation. This means during training, every -episode showed a different visual appearance of the same track. The model was -forced to learn track-geometry features (road edges, markings) rather than -lighting-specific patterns. This visual robustness is almost certainly why -Trial 9 can zero-shot generalise to mini_monaco. +## What We Know (cumulative) + +### Reward functions +- **v4** (base × efficiency × speed_bonus): works for generated_track; gives zero gradient on mountain hills +- **v5** (speed × CTE-quality): works for mountain; circular driving exploit possible on flat track +- **v6** (v5 + efficiency gate ≥ 0.15): prevents circular exploit; may suppress early exploration + +### Training approaches tried and their outcomes +| Approach | Result | +|---|---| +| Single-track PPO (Exp 9, 13) | ✅ Reliable. Best per-track performance. | +| Round-robin close-and-switch (Wave 4, Exp 10) | ❌ 80% failure rate. Disrupts PPO rollout buffer. | +| Parallel DummyVecEnv 90k steps (Exp 11b) | ⚠️ Infrastructure works; 90k too few steps (194 steps on all tracks). | +| Cross-track warm start both directions (Exp 15, 16) | ❌ Both failed. Single-track policies too specialised for naive transfer. | + +### Mountain track physics (fixed 2026-04-27) +The mountain_track.unity scene assigned Slippery physics material (staticFriction=0.1) +to 4 track surface colliders. WheelPhys.cs scales wheel grip by surface staticFriction, +so the car had 1/5 normal grip on the hill. This caused visible wheelspin. +Fixed by assigning Road material (staticFriction=0.5) to those 4 colliders in +`sdsim/Assets/Scenes/mountain_track.unity`. The project uses a pre-built Windows +executable (DonkeySimWin/donkey_sim.exe), so this fix is deferred until the sim +is rebuilt from source in Unity Editor. Proceed with Exp 17 using the existing binary. + +### Key parameter knowledge +- **lr:** 0.000725 (from Trial 9 and Exp 9 — consistent with good results) +- **throttle_min:** 0.2 (v5/v6 reward gives non-zero gradient on hills even at 0.2) +- **n_steer/n_throttle:** Relevant for discrete action space only (PPO uses continuous) +- **Per-env throttle_min in DummyVecEnv:** Feasible — each env wrapped independently --- -## Full Test Results — April 16 +## Open Strategy (as of April 27) -| Test | Model | Track | Laps | Steps | Verdict | -|---|---|---|---|---|---| -| 1 | Phase 2 champion | generated_road | n/a (not a loop) | 2000/2000 | ✅ DRIVES | -| 2 | Wave 4 Trial 3 | generated_track | — | — | ❌ MODEL CORRUPTED | -| 3 | Wave 4 Trial 9 | generated_track | 6 laps × 3 eps | 2000/2000 | ✅ DRIVES | -| 4 | Wave 4 Trial 9 | mini_monaco | 2 laps per ep | 2000/2000 | ✅ DRIVES (zero-shot) | -| 5 | Wave 4 Trial 14 | mini_monaco | 1 lap ep2 only | 257/901/253 | ⚠️ INCONSISTENT | -| 6 | Wave 4 Trial 25 | mini_monaco | 0 | ~147/eps | ❌ CRASHES | -| + | Wave 4 Trial 19 | generated_track | 5-6 laps × 2 eps | crash/2000/2000 | ✅ MOSTLY | -| + | Wave 4 Trial 22 | generated_track | 0 | ~110/eps | ❌ SAME SPOT | -| + | Wave 4 Trial 2 | generated_track | 0 | ~76/eps | ❌ CRASHES | -| + | Trial 3 (recovered) | generated_track | 0 | ~104/eps | ❌ CRASHES | +The goal is reliable multi-track generalisation. The validated path forward: ---- +1. **Exp 17:** Parallel DummyVecEnv with 400k–500k steps + - Two sim instances: generated_track:9091, mountain_track:9093 + - v6 reward on both (efficiency gate + CTE patience terminator) + - throttle_min=0.2 both envs (or optionally 0.5 on mountain, 0.2 on generated) + - lr=0.000725, checkpoint every 20k, best_model tracked throughout + - Eval mini_monaco zero-shot at every checkpoint +3. **If Exp 17 plateaus:** Try curriculum (generated_track only for 150k, then add mountain) +4. **If still stuck:** Tune v6 efficiency gate threshold (check % steps gated in early training) -## What We Know Now - -1. **Trial 9 is a genuine multi-track model.** It drives generated_track - consistently (3/3) with clean laps, AND generalises zero-shot to - mini_monaco (never seen in training). This is real progress. - -2. **The "amazing" overnight model (Trial 3) is lost.** The model.zip has - a corrupted optimizer file. Policy weights were recovered but the model - crashes at ~104 steps — the "amazing" driving was at an intermediate - training checkpoint, not the final saved model. - -3. **Most Wave 4 high scores were not exploits — they were real.** - Trials 5, 6, and 14 showed inconsistent results (crash some episodes, - complete lap on others). The model was genuinely learning but unreliably. - Only Trial 14 and 25's original very high scores (1573, 1543) appear - to have been exploits in the original training eval. - -4. **Lighting variation on generated_track is a feature, not a bug.** - Procedural generation changes sun angle / shadows each episode, forcing - the model to learn geometry rather than appearance. This may be the key - to Trial 9's generalisation ability. - -5. **Mountain_track training — unknown contribution.** We don't know if - mountain_track training helped or hurt. Trial 9 drives generated_track - and mini_monaco; whether it can drive mountain_track is untested. - ---- - -## Open Questions for Strategy Discussion - -1. Can Trial 9 also drive mountain_track? (untested) -2. Can Trial 9 drive generated_road? (untested — zero-shot to Phase 2 training track) -3. Why does Trial 9 drive mini_monaco but other models with similar - mini_monaco scores (Trial 14: 193, Trial 22: 193) don't reliably? -4. Would more training steps from Trial 9's hyperparameters produce - an even better model? -5. Is mountain_track necessary, or could we get Trial 9's results - training on generated_track alone? +See `docs/TEST_HISTORY.md` for full Exp 17 design. --- @@ -102,9 +85,9 @@ Trial 9 can zero-shot generalise to mini_monaco. | Model | Path | Status | |---|---|---| -| Phase 2 champion | models/champion/model.zip | ✅ Good | -| Wave 4 Trial 9 | models/wave4-trial-0009/model.zip | ✅ Best model | -| Wave 4 Trial 19 | models/wave4-trial-0019/model.zip | ✅ Good | -| Wave 4 Trial 3 | models/wave4-trial-0003/model.zip | ❌ Corrupted | -| Wave 4 Trials 1,2,5-8,10-25 | models/wave4-trial-XXXX/ | Available, mostly crash on generated_track | - +| exp13-gentrack-v4 | models/exp13-gentrack-v4/best_model.zip | ✅ Generated_track specialist | +| exp14-mountain-v5-finetune ft_036k | models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip | ✅ Mountain specialist (best overall mountain model) | +| exp14-mountain-v5 | models/exp14-mountain-v5/best_model.zip | ✅ Mountain base (good, slightly worse than ft_036k) | +| Wave 4 Trial 9 | models/wave4-trial-0009/model.zip | ✅ Best generalising model; unreproducible | +| Phase 2 champion | models/champion/model.zip | ✅ generated_road specialist only | +| Wave 4 other trials | models/wave4-trial-XXXX/ | Mostly crash on all tracks | diff --git a/docs/TEST_HISTORY.md b/docs/TEST_HISTORY.md index c05494f..39ac87d 100644 --- a/docs/TEST_HISTORY.md +++ b/docs/TEST_HISTORY.md @@ -508,3 +508,105 @@ For now: - keep the single-track champions as separate specialists - do **not** assume direct cross-track warm starts are beneficial +--- + +## Mountain Track Friction Fix (2026-04-27) + +### Root cause + +`WheelPhys.cs` scales wheel grip by the static friction of whatever surface the +wheel is touching: `fFriction.stiffness = hit.collider.material.staticFriction * originalForwardStiffness`. + +`mountain_track.unity` assigned the Slippery physics material (staticFriction=0.1) +to 4 track surface colliders from the `long_road` prefab. This gave the car 1/5 +the normal grip on the hill, causing visible wheelspin even at full throttle. + +The Slippery material is intentional on genuinely icy surfaces (thunderhill) but +was incorrect on mountain_track's asphalt hill. + +### Fix applied + +Replaced all 4 Slippery material assignments with Road material (staticFriction=0.5) +in `sdsim/Assets/Scenes/mountain_track.unity`. + +| Material | staticFriction | GUID | +|---|---|---| +| Slippery (removed) | 0.1 | c0e12c099c364af4e9e311a43d0f12c4 | +| Road (applied) | 0.5 | 7884193b0ead347a38a13a67f294dfb5 | + +### To activate + +The training setup uses the pre-built Windows executable (`DonkeySimWin/donkey_sim.exe`), +not a locally-compiled build. The scene file edit in sdsandbox/ has no effect on the +running binary — it only matters if the sim is ever rebuilt from source in Unity Editor. + +**This fix is deferred.** Proceed with Exp 17 using the existing executable. +If mountain hill training in Exp 17 specifically struggles (short episodes that plateau +and never improve), that is the signal to pursue a Unity Editor rebuild. + +The scene file change is committed in sdsandbox/ and will apply automatically if the +sim is rebuilt for any other reason. No Python code changes needed. + +### Expected effect + +- Hill wheelspin should stop or greatly reduce +- Throttle_min=0.2 + v5 reward should be even more effective on the hill +- All future mountain experiments benefit; no code changes needed + +--- + +## Strategy Review and Exp 17 Plan (2026-04-27) + +### Where the project stands + +After 16 experiments and 4 autoresearch phases, the core problem is clear: +multi-track training is needed for generalisation, but the training method has +been unreliable. Here is the summary of what each approach found: + +| Approach | Outcome | +|---|---| +| Round-robin close-and-switch (Wave 4, Exp 10) | 80% failure. PPO rollout buffer disrupted on env swap. Lucky seed (Trial 9) worked once but cannot be reproduced. | +| Parallel DummyVecEnv 90k steps (Exp 11b) | Infrastructure valid, no catastrophic forgetting, but 90k steps / 2 tracks = ~45k effective per track. Not enough. | +| Cross-track warm starts (Exp 15, 16) | Both directions failed. Single-track specialists do not transfer cleanly. | +| Single-track PPO (Exp 9, 13, 14) | Reliable but no generalisation. | + +The conclusion: **parallel DummyVecEnv is the right architecture; the only known +failure mode is training budget**. Exp 11b was mechanically sound but starved of steps. + +### Exp 17 — Parallel DummyVecEnv, 400k–500k steps + +**This is the primary next experiment.** + +| Parameter | Value | Reason | +|---|---|---| +| Architecture | DummyVecEnv([generated_track:9091, mountain_track:9093]) | Validated in Exp 11b; no PPO disruption | +| Total timesteps | 400,000–500,000 | ~200k effective per track; Exp 11b proved 90k insufficient | +| Reward | v6 on both envs (efficiency gate + CTE patience terminator) | Blocks circular exploit on generated_track; gate threshold may be tuned | +| throttle_min | 0.2 both envs (or 0.5 mountain, 0.2 generated — see ADR-020) | v5/v6 gradient non-zero on hills at 0.2 | +| learning_rate | 0.000725 | From Trial 9 and Exp 9 — consistent with best results | +| Checkpoint | every 20,000 steps + best_model.zip tracked throughout | ADR-017: best model ≠ final model | +| Eval | mini_monaco zero-shot at every checkpoint | Detect the peak before policy drifts | +| Warm start | None — train from random weights | ADR-024: cross-track warm starts failed | + +**Setup checklist before running:** +1. Two sim instances running: one on port 9091, one on port 9093 +2. Both on the same track as configured (generated_track and mountain_track) +3. Rebuild simulator with mountain friction fix active +4. Verify throughput: run 2-minute timing benchmark, set step cap accordingly (ADR-014) + +**Success criterion:** mini_monaco zero-shot score > 500 (at least 25% of a full +2000-step episode) reliably across 3 evaluation sets, reproducible across 2+ runs. + +### Fallback: Curriculum training (if Exp 17 plateaus below 200) + +If Exp 17 cannot get past ~200 steps on mini_monaco: +- Phase A: generated_track only, 150k steps (establish road-following) +- Phase B: add mountain_track to DummyVecEnv, continue 250k more steps +- Rationale: gives the policy a foundation before the harder mountain physics + +### Fallback: v6 efficiency gate tuning (if gate is too aggressive) + +Log what fraction of steps are gated (reward zeroed) in the first 100k steps. +If >40%, lower the gate threshold from 0.15 to 0.10 for the first 150k steps, +then raise it back to 0.15. Prevents the gate from suppressing early exploration. +