From b504b89b2a8eb9fe6ade9e3c472bbac92d3fd56d Mon Sep 17 00:00:00 2001
From: Paul Huliganga <paje0101@gmail.com>
Date: Tue, 28 Apr 2026 02:42:20 -0400
Subject: [PATCH] feat: add exp17 parallel DummyVecEnv 450k training + strategy
 docs

- exp17_parallel_450k.py: parallel two-track training (generated_track:9091,
  mountain_track:9093), 450k steps, v6 reward, HOST=localhost
- DECISIONS.md: ADR-025 (parallel strategy) and ADR-026 (mountain friction fix)
- docs/STATE.md: updated to April 2026 state with current champions and strategy
- docs/TEST_HISTORY.md: mountain friction fix notes + Exp 17 full design
- outerloop-results: exp14 finetune logs and robust mountain eval results

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 DECISIONS.md                                  |  76 +++++++
 agent/experiments/README.md                   |   1 +
 agent/experiments/exp17_parallel_450k.py      | 199 ++++++++++++++++++
 .../outerloop-results/exp14_finetune_log.txt  |  61 ++++++
 .../exp14_finetune_results.jsonl              |  20 ++
 .../robust_eval_mountain.jsonl                |  13 ++
 docs/STATE.md                                 | 149 ++++++-------
 docs/TEST_HISTORY.md                          | 102 +++++++++
 8 files changed, 538 insertions(+), 83 deletions(-)
 create mode 100644 agent/experiments/exp17_parallel_450k.py
 create mode 100644 agent/outerloop-results/exp14_finetune_log.txt
 create mode 100644 agent/outerloop-results/exp14_finetune_results.jsonl
 create mode 100644 agent/outerloop-results/robust_eval_mountain.jsonl

diff --git a/DECISIONS.md b/DECISIONS.md
index 41c125a..2dafce4 100644
--- a/DECISIONS.md
+++ b/DECISIONS.md
@@ -576,3 +576,79 @@ experts, not as obviously reusable initializations for the other track.
 - If transfer is revisited, it likely needs a more careful method than naive direct
   warm-starting on the other track
 - Mountain physics issues should be addressed before revisiting transfer conclusions
+
+---
+
+## ADR-025: Parallel DummyVecEnv with 400k+ Steps is the Primary Multi-Track Strategy
+
+**Date:** 2026-04-27
+**Status:** Active
+
+**Context:** After Wave 4 (25 trials, 80% failure rate), Exp 10 (catastrophic forgetting),
+Exp 11b (infrastructure works but 90k steps insufficient), and Exp 15/16 (cross-track
+warm starts failed both directions), the only multi-track approach that did not have a
+fundamental flaw was parallel DummyVecEnv — Exp 11b failed only because the training
+budget was halved relative to what single-track training needs.
+
+**Decision:** The primary next strategy is:
+1. Two sim instances (one per training track, separate ports)
+2. SB3 `DummyVecEnv([env_generated, env_mountain])` — PPO sees both tracks in every batch
+3. 400,000–500,000 total timesteps (~200k effective per track)
+4. v6 reward (efficiency gate + CTE patience terminator) on both envs
+5. No warm start — train from random weights
+6. Checkpoint every 20k steps, track mini_monaco zero-shot score throughout
+
+**Why parallel DummyVecEnv:**
+- PPO is an on-policy algorithm that depends on a stable rollout buffer.
+  Swapping environments mid-training disrupts value estimates and causes catastrophic forgetting.
+  DummyVecEnv feeds both tracks into every PPO rollout batch — no forgetting, no disruption.
+- This is how SB3 was designed to be used with multiple environments.
+
+**Why 400k+ steps:**
+- Single-track training converges in ~60–90k steps.
+- Two parallel tracks need at least 2× the budget because each track gets half the gradient.
+  Interference between the two tasks adds further overhead.
+- Exp 11b at 90k steps (effectively 45k per track) produced only 194-step drives on both tracks.
+  400k should provide adequate budget for both.
+
+**Rejected alternatives:**
+- Round-robin close-and-switch: disrupts PPO, 80% failure rate across 25 trials
+- Cross-track warm starts: failed both directions (ADR-024)
+- More autoresearch trials on round-robin: the method is fundamentally unreliable
+
+**Fallback if 400k parallel fails:** Curriculum — train generated_track alone for 150k steps,
+then add mountain to the DummyVecEnv pool for 250k more steps.
+
+---
+
+## ADR-026: Mountain Track Friction Fix — Use Road Material on Hill Colliders
+
+**Date:** 2026-04-27
+**Status:** Accepted — fix applied
+
+**Context:** `WheelPhys.cs` multiplies wheel grip stiffness by the static friction of the
+surface the wheel is hitting. The mountain_track scene assigned Slippery physics material
+(staticFriction=0.1) to 4 track surface colliders from the long_road prefab, giving the
+car 1/5 the normal traction on the hill. This caused visible wheelspin at full throttle and
+made hill climbing genuinely difficult for learned policies.
+
+**Decision:** Replace the 4 Slippery material assignments in `mountain_track.unity` with the
+Road material (staticFriction=0.5). This is a targeted scene-level override; the Slippery
+material asset itself is unchanged and remains available for intentionally slippery surfaces.
+
+**Fix location:** `sdsim/Assets/Scenes/mountain_track.unity` — all 4 PrefabModification
+entries that set `propertyPath: m_Material` on long_road colliders now reference Road
+(GUID 7884193b0ead347a38a13a67f294dfb5) instead of Slippery (GUID c0e12c099c364af4e9e311a43d0f12c4).
+
+**To activate:** Rebuild the Unity simulator binary after pulling the updated scene file.
+No Python code changes needed.
+
+**What this does NOT change:**
+- `Slippery.physicMaterial` asset — unchanged (still used by thunderhill, circuit_launch)
+- `Donkey_new_phys.prefab` strut colliders — also reference Slippery, but these are car body
+  parts that the wheels don't touch. WheelPhys.cs only reads friction from ground hits.
+- mini_monaco.unity — also has one Slippery reference; left intentional for now
+
+**Expected effect:** Hill wheelspin should stop. The policy should find it easier to climb
+the hill at throttle_min=0.2, and Exp 17 multi-track results should be more interpretable
+since we are no longer fighting a physics artifact.
diff --git a/agent/experiments/README.md b/agent/experiments/README.md
index cc9d8e0..b79a003 100644
--- a/agent/experiments/README.md
+++ b/agent/experiments/README.md
@@ -5,6 +5,7 @@ Each corresponds to an entry in docs/TEST_HISTORY.md.
 
 | Script | Experiment | Key change |
 |---|---|---|
+| exp17_parallel_450k.py | Exp 17 | Parallel DummyVecEnv, 450k steps, v6 reward, HOST=localhost |
 | mountain_v5.py | Exp 5 | v5 reward + throttle_min=0.5, direct model.learn() |
 | mountain_continue.py | Exp 4 | Continued Exp3 training |
 | mountain_high_throttle.py | Exp 3 | throttle_min=0.5, old v4 reward |
diff --git a/agent/experiments/exp17_parallel_450k.py b/agent/experiments/exp17_parallel_450k.py
new file mode 100644
index 0000000..b58b568
--- /dev/null
+++ b/agent/experiments/exp17_parallel_450k.py
@@ -0,0 +1,199 @@
+"""
+Exp 17: Parallel DummyVecEnv — generated_track + mountain_track, 450k steps.
+
+Strategy: Exp 11b proved the parallel DummyVecEnv infrastructure is stable.
+The only failure mode was insufficient training budget (~45k effective steps
+per track). This experiment triples the budget to ~225k per track.
+
+Changes from Exp 11b:
+  - HOST: 10.0.0.55 → localhost  (WSL/Windows share ports)
+  - TOTAL_STEPS: 90k → 450k
+  - CHECKPOINT_EVERY: 6k → 20k
+  - SAVE_DIR: exp17-parallel-450k
+
+Everything else identical to Exp 11b (same reward, wrappers, lr, throttle_min).
+
+Setup — TWO sim instances required:
+  Sim 1: launch donkey_sim.exe, select generated_track, port 9091 (default)
+  Sim 2: launch a second donkey_sim.exe with --port 9093, select mountain_track
+         Command: donkey_sim.exe --port 9093
+
+  Both sims must be running and on the correct tracks before starting this script.
+
+Evaluation:
+  - Mid-training: both training tracks evaluated at each 20k checkpoint
+  - End-of-training: all 4 tracks evaluated sequentially (port 9091)
+"""
+import sys, os, time
+sys.path.insert(0, '/home/paulh/projects/donkeycar-rl-autoresearch/agent')
+
+from multitrack_runner import log, StuckTerminationWrapper
+from donkeycar_sb3_runner import ThrottleClampWrapper
+from reward_wrapper import SpeedRewardWrapper
+from stable_baselines3 import PPO
+from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage
+import gymnasium as gym
+import numpy as np
+
+HOST             = 'localhost'
+THROTTLE_MIN     = 0.2
+LR               = 0.000725
+TOTAL_STEPS      = 450_000
+CHECKPOINT_EVERY = 20_000
+SAVE_DIR         = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp17-parallel-450k'
+os.makedirs(SAVE_DIR, exist_ok=True)
+
+
+def make_env(track_id, port):
+    def _init():
+        raw = gym.make(track_id, conf={'host': HOST, 'port': port})
+        env = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
+        env = StuckTerminationWrapper(env, stuck_steps=40, min_displacement=0.5)
+        env = SpeedRewardWrapper(env)
+        return env
+    return _init
+
+
+log('=' * 60)
+log('Exp 17: Parallel DummyVecEnv — 450k steps')
+log(f'  Sim 1: {HOST}:9091 → generated_track')
+log(f'  Sim 2: {HOST}:9093 → mountain_track')
+log(f'  throttle_min={THROTTLE_MIN}, lr={LR}, total={TOTAL_STEPS:,}')
+log(f'  Reward: v6 (speed × CTE_quality, efficiency gate >= 0.15)')
+log(f'  Stuck termination: 40 steps (~2.5s)')
+log(f'  Checkpoints: every {CHECKPOINT_EVERY:,} steps')
+log('=' * 60)
+
+log('Creating DummyVecEnv with two tracks...')
+env = DummyVecEnv([
+    make_env('donkey-generated-track-v0', 9091),
+    make_env('donkey-mountain-track-v0', 9093),
+])
+env = VecTransposeImage(env)
+log(f'  VecEnv num_envs={env.num_envs}, obs={env.observation_space.shape}')
+
+model = PPO('CnnPolicy', env, learning_rate=LR, verbose=1, device='cpu')
+log('PPO created. Starting training...')
+
+best_reward = float('-inf')
+steps_done = 0
+
+while steps_done < TOTAL_STEPS:
+    seg_steps = min(CHECKPOINT_EVERY, TOTAL_STEPS - steps_done)
+    model.learn(total_timesteps=seg_steps, reset_num_timesteps=False)
+    steps_done += seg_steps
+
+    ckpt = os.path.join(SAVE_DIR, f'checkpoint_{steps_done:07d}')
+    model.save(ckpt)
+    model.save(os.path.join(SAVE_DIR, 'model'))
+    log(f'[{steps_done:,}/{TOTAL_STEPS:,}] Checkpoint saved: {ckpt}.zip')
+
+    # Eval on both training tracks using the existing DummyVecEnv connections
+    try:
+        obs = env.reset()
+        ep_rewards = np.zeros(env.num_envs)
+        ep_steps   = np.zeros(env.num_envs)
+        done_mask  = np.zeros(env.num_envs, dtype=bool)
+        for _ in range(2000):
+            action, _ = model.predict(obs, deterministic=True)
+            obs, rewards, dones, infos = env.step(action)
+            for i in range(env.num_envs):
+                if not done_mask[i]:
+                    ep_rewards[i] += rewards[i]
+                    ep_steps[i]   += 1
+                    if dones[i]:
+                        done_mask[i] = True
+            if done_mask.all():
+                break
+
+        status0 = '✅' if ep_steps[0] >= 2000 else f'❌@{int(ep_steps[0])}'
+        status1 = '✅' if ep_steps[1] >= 2000 else f'❌@{int(ep_steps[1])}'
+        log(f'  Eval: gen_track={ep_rewards[0]:.1f}r/{int(ep_steps[0])}s {status0}  '
+            f'mountain={ep_rewards[1]:.1f}r/{int(ep_steps[1])}s {status1}')
+
+        total_reward = ep_rewards.sum()
+        if total_reward > best_reward:
+            best_reward = total_reward
+            model.save(os.path.join(SAVE_DIR, 'best_model'))
+            log(f'  ⭐ NEW BEST: {best_reward:.1f} combined reward')
+    except Exception as e:
+        log(f'  Eval error: {e}')
+        import traceback; traceback.print_exc()
+
+model.save(os.path.join(SAVE_DIR, 'model'))
+log(f'\nTraining complete. Best combined reward: {best_reward:.1f}')
+
+env.close()
+time.sleep(5)
+
+# --- Final eval on all 4 tracks (sequential, port 9091) ---
+log('\n' + '=' * 60)
+log('FINAL EVALUATION: best_model on 4 tracks (3 sets each)')
+log('=' * 60)
+
+EVAL_TRACKS = [
+    ('donkey-generated-track-v0',  'generated_track'),
+    ('donkey-mountain-track-v0',   'mountain_track'),
+    ('donkey-minimonaco-track-v0', 'mini_monaco'),
+    ('donkey-generated-roads-v0',  'generated_road'),
+]
+EVAL_PORT     = 9091
+EVAL_SETS     = 3
+EVAL_MAX_STEPS = 2000
+
+best_model_path  = os.path.join(SAVE_DIR, 'best_model.zip')
+results_by_track = {}
+
+for track_id, track_name in EVAL_TRACKS:
+    log(f'\n--- {track_name} ---')
+    steps_list = []
+
+    for s in range(1, EVAL_SETS + 1):
+        try:
+            raw       = gym.make(track_id, conf={'host': HOST, 'port': EVAL_PORT})
+            inner     = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
+            inner     = StuckTerminationWrapper(inner, stuck_steps=40, min_displacement=0.5)
+            inner     = SpeedRewardWrapper(inner)
+            eval_env  = VecTransposeImage(DummyVecEnv([lambda e=inner: e]))
+
+            eval_model = PPO.load(best_model_path, env=eval_env, device='cpu')
+
+            obs = eval_env.reset()
+            total_r, total_s, done = 0.0, 0, False
+            while not done and total_s < EVAL_MAX_STEPS:
+                action, _ = eval_model.predict(obs, deterministic=True)
+                result = eval_env.step(action)
+                if len(result) == 4:
+                    obs, r, d, info = result
+                    done = bool(d[0])
+                else:
+                    obs, r, t, tr, info = result
+                    done = bool(t[0] or tr[0])
+                total_r += float(r[0])
+                total_s += 1
+
+            status = '✅' if total_s >= EVAL_MAX_STEPS else f'❌@{total_s}'
+            log(f'  Set {s}: {total_r:.1f}r / {total_s}s {status}')
+            steps_list.append(total_s)
+
+            eval_env.close()
+            time.sleep(3)
+        except Exception as e:
+            log(f'  Set {s}: ERROR — {e}')
+            steps_list.append(0)
+            time.sleep(3)
+
+    mean_steps = np.mean(steps_list) if steps_list else 0
+    results_by_track[track_name] = steps_list
+    log(f'  Mean: {mean_steps:.0f} steps')
+
+log('\n' + '=' * 60)
+log('SUMMARY')
+log('=' * 60)
+for track_name, steps_list in results_by_track.items():
+    steps_str = '/'.join(str(s) for s in steps_list)
+    mean      = np.mean(steps_list)
+    verdict   = '✅' if mean >= 1500 else '⚠️' if mean >= 500 else '❌'
+    log(f'  {verdict} {track_name:20s}: {steps_str}  mean={mean:.0f}')
+
+log(f'\n=== Exp 17 COMPLETE ===')
diff --git a/agent/outerloop-results/exp14_finetune_log.txt b/agent/outerloop-results/exp14_finetune_log.txt
new file mode 100644
index 0000000..5f04654
--- /dev/null
+++ b/agent/outerloop-results/exp14_finetune_log.txt
@@ -0,0 +1,61 @@
+2026-04-20T00:08:21.090963 Loading warm-start model from models/exp14-mountain-v5/best_model.zip
+2026-04-20T00:09:16.674927 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using throttle_min=0.2 env
+2026-04-20T00:09:19.055092 Switching model to env with throttle_min=0.4
+2026-04-20T00:10:27.385278 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env
+2026-04-20T00:11:08.699368 ERROR during fine-tune: 'NoneType' object is not callable
+2026-04-20T00:11:08.901669 Fine-tune complete. steps_done=0
+2026-04-20T00:14:43.472139 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env
+2026-04-20T00:17:44.473941 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env
+2026-04-20T00:21:10.924456 Loading warm-start model from models/exp14-mountain-v5/best_model.zip using base throttle_min=0.2 env
+2026-04-20T00:25:31.932947 Loading warm-start model from models/exp14-mountain-v5/best_model.zip without creating a temp env
+2026-04-20T00:28:59.848890 [6000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0006000.zip
+2026-04-20T00:28:59.848966 ERROR during fine-tune: name 'make_env' is not defined
+2026-04-20T00:29:00.509181 Fine-tune complete. steps_done=6000
+2026-04-20T00:31:09.594830 Loading warm-start model from models/exp14-mountain-v5/best_model.zip without creating a temp env
+2026-04-20T00:34:50.056288 [6000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0006000.zip
+2026-04-20T00:35:04.415348 ERROR during fine-tune: name 'json' is not defined
+2026-04-20T00:35:04.546033 Fine-tune complete. steps_done=6000
+2026-04-20T00:37:47.831240 Loading warm-start model from models/exp14-mountain-v5/best_model.zip without creating a temp env
+2026-04-20T00:41:21.675776 [6000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0006000.zip
+2026-04-20T00:41:43.554021   Eval @ 6000: mean_steps=384.7 mean_lap=21.59375
+2026-04-20T00:41:43.694831   ⭐ NEW BEST (mean lap 21.59s) saved
+2026-04-20T00:45:26.980198 [12000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0012000.zip
+2026-04-20T00:45:42.741989   Eval @ 12000: mean_steps=187.7 mean_lap=None
+2026-04-20T00:49:24.586893 [18000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0018000.zip
+2026-04-20T00:49:42.795830   Eval @ 18000: mean_steps=287.3 mean_lap=None
+2026-04-20T00:53:15.614884 [24000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0024000.zip
+2026-04-20T00:53:37.070339   Eval @ 24000: mean_steps=374.7 mean_lap=21.765625
+2026-04-20T00:57:09.352148 [30000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0030000.zip
+2026-04-20T00:57:36.938090   Eval @ 30000: mean_steps=537.7 mean_lap=22.046875
+2026-04-20T00:57:36.938120 Switching env to throttle_min=0.2
+2026-04-20T01:00:55.914640 [36000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0036000.zip
+2026-04-20T01:01:56.665949   Eval @ 36000: mean_steps=1451.7 mean_lap=28.434895833333332
+2026-04-20T01:05:10.807288 [42000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0042000.zip
+2026-04-20T01:05:57.449632   Eval @ 42000: mean_steps=1067.7 mean_lap=27.44140625
+2026-04-20T01:08:54.843851 [48000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0048000.zip
+2026-04-20T01:10:00.878424   Eval @ 48000: mean_steps=1626.7 mean_lap=29.776785714285715
+2026-04-20T01:13:16.089861 [54000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0054000.zip
+2026-04-20T01:14:18.435622   Eval @ 54000: mean_steps=1528.3 mean_lap=30.234375
+2026-04-20T01:17:25.682859 [60000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0060000.zip
+2026-04-20T01:18:28.243356   Eval @ 60000: mean_steps=1533.0 mean_lap=34.33125
+2026-04-20T01:21:38.247436 [66000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0066000.zip
+2026-04-20T01:21:54.995379   Eval @ 66000: mean_steps=163.7 mean_lap=None
+2026-04-20T01:25:14.752223 [72000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0072000.zip
+2026-04-20T01:26:11.926001   Eval @ 72000: mean_steps=1389.7 mean_lap=43.21484375
+2026-04-20T01:29:24.138321 [78000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0078000.zip
+2026-04-20T01:29:59.928582   Eval @ 78000: mean_steps=757.0 mean_lap=43.453125
+2026-04-20T01:33:15.187091 [84000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0084000.zip
+2026-04-20T01:33:49.188449   Eval @ 84000: mean_steps=704.7 mean_lap=41.046875
+2026-04-20T01:36:57.554346 [90000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0090000.zip
+2026-04-20T01:38:12.054640   Eval @ 90000: mean_steps=1819.0 mean_lap=None
+2026-04-20T01:41:29.620560 [96000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0096000.zip
+2026-04-20T01:42:07.583154   Eval @ 96000: mean_steps=813.0 mean_lap=None
+2026-04-20T01:45:23.503967 [102000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0102000.zip
+2026-04-20T01:45:59.052782   Eval @ 102000: mean_steps=747.3 mean_lap=None
+2026-04-20T01:49:02.510514 [108000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0108000.zip
+2026-04-20T01:49:27.462705   Eval @ 108000: mean_steps=466.0 mean_lap=None
+2026-04-20T01:52:40.338223 [114000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0114000.zip
+2026-04-20T01:53:31.593848   Eval @ 114000: mean_steps=1169.0 mean_lap=None
+2026-04-20T01:56:39.035861 [120000/120000] Checkpoint saved: models/exp14-mountain-v5-finetune/checkpoint_0120000.zip
+2026-04-20T01:57:28.658996   Eval @ 120000: mean_steps=1125.0 mean_lap=None
+2026-04-20T01:57:28.795051 Fine-tune complete. steps_done=120000
diff --git a/agent/outerloop-results/exp14_finetune_results.jsonl b/agent/outerloop-results/exp14_finetune_results.jsonl
new file mode 100644
index 0000000..f4777fc
--- /dev/null
+++ b/agent/outerloop-results/exp14_finetune_results.jsonl
@@ -0,0 +1,20 @@
+{"steps_done": 6000, "throttle_min": 0.4, "mean_steps": 384.6666666666667, "mean_lap_time": 21.59375, "per_set": [{"steps": 205, "laps": 0, "lap_times": []}, {"steps": 177, "laps": 0, "lap_times": []}, {"steps": 772, "laps": 1, "lap_times": [21.59375]}]}
+{"steps_done": 12000, "throttle_min": 0.4, "mean_steps": 187.66666666666666, "mean_lap_time": null, "per_set": [{"steps": 145, "laps": 0, "lap_times": []}, {"steps": 345, "laps": 0, "lap_times": []}, {"steps": 73, "laps": 0, "lap_times": []}]}
+{"steps_done": 18000, "throttle_min": 0.4, "mean_steps": 287.3333333333333, "mean_lap_time": null, "per_set": [{"steps": 233, "laps": 0, "lap_times": []}, {"steps": 244, "laps": 0, "lap_times": []}, {"steps": 385, "laps": 0, "lap_times": []}]}
+{"steps_done": 24000, "throttle_min": 0.4, "mean_steps": 374.6666666666667, "mean_lap_time": 21.765625, "per_set": [{"steps": 178, "laps": 0, "lap_times": []}, {"steps": 359, "laps": 0, "lap_times": []}, {"steps": 587, "laps": 1, "lap_times": [21.765625]}]}
+{"steps_done": 30000, "throttle_min": 0.4, "mean_steps": 537.6666666666666, "mean_lap_time": 22.046875, "per_set": [{"steps": 854, "laps": 1, "lap_times": [22.046875]}, {"steps": 365, "laps": 0, "lap_times": []}, {"steps": 394, "laps": 0, "lap_times": []}]}
+{"steps_done": 36000, "throttle_min": 0.2, "mean_steps": 1451.6666666666667, "mean_lap_time": 28.434895833333332, "per_set": [{"steps": 1540, "laps": 2, "lap_times": [29.34375, 26.84375]}, {"steps": 2000, "laps": 3, "lap_times": [29.4375, 28.4375, 27.015625]}, {"steps": 815, "laps": 1, "lap_times": [29.53125]}]}
+{"steps_done": 42000, "throttle_min": 0.2, "mean_steps": 1067.6666666666667, "mean_lap_time": 27.44140625, "per_set": [{"steps": 467, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 3, "lap_times": [27.046875, 27.703125, 27.125]}, {"steps": 736, "laps": 1, "lap_times": [27.890625]}]}
+{"steps_done": 48000, "throttle_min": 0.2, "mean_steps": 1626.6666666666667, "mean_lap_time": 29.776785714285715, "per_set": [{"steps": 2000, "laps": 3, "lap_times": [30.796875, 29.828125, 28.734375]}, {"steps": 880, "laps": 1, "lap_times": [30.65625]}, {"steps": 2000, "laps": 3, "lap_times": [29.703125, 29.203125, 29.515625]}]}
+{"steps_done": 54000, "throttle_min": 0.2, "mean_steps": 1528.3333333333333, "mean_lap_time": 30.234375, "per_set": [{"steps": 585, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 3, "lap_times": [32.734375, 29.8125, 30.8125]}, {"steps": 2000, "laps": 3, "lap_times": [31.171875, 29.71875, 27.15625]}]}
+{"steps_done": 60000, "throttle_min": 0.2, "mean_steps": 1533.0, "mean_lap_time": 34.33125, "per_set": [{"steps": 2000, "laps": 2, "lap_times": [39.140625, 33.140625]}, {"steps": 599, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 3, "lap_times": [34.21875, 31.953125, 33.203125]}]}
+{"steps_done": 66000, "throttle_min": 0.2, "mean_steps": 163.66666666666666, "mean_lap_time": null, "per_set": [{"steps": 154, "laps": 0, "lap_times": []}, {"steps": 146, "laps": 0, "lap_times": []}, {"steps": 191, "laps": 0, "lap_times": []}]}
+{"steps_done": 72000, "throttle_min": 0.2, "mean_steps": 1389.6666666666667, "mean_lap_time": 43.21484375, "per_set": [{"steps": 2000, "laps": 2, "lap_times": [50.140625, 35.6875]}, {"steps": 169, "laps": 0, "lap_times": []}, {"steps": 2000, "laps": 2, "lap_times": [39.890625, 47.140625]}]}
+{"steps_done": 78000, "throttle_min": 0.2, "mean_steps": 757.0, "mean_lap_time": 43.453125, "per_set": [{"steps": 174, "laps": 0, "lap_times": []}, {"steps": 1074, "laps": 1, "lap_times": [46.03125]}, {"steps": 1023, "laps": 1, "lap_times": [40.875]}]}
+{"steps_done": 84000, "throttle_min": 0.2, "mean_steps": 704.6666666666666, "mean_lap_time": 41.046875, "per_set": [{"steps": 953, "laps": 1, "lap_times": [40.21875]}, {"steps": 181, "laps": 0, "lap_times": []}, {"steps": 980, "laps": 1, "lap_times": [41.875]}]}
+{"steps_done": 90000, "throttle_min": 0.2, "mean_steps": 1819.0, "mean_lap_time": null, "per_set": [{"steps": 2000, "laps": 0, "lap_times": []}, {"steps": 1963, "laps": 0, "lap_times": []}, {"steps": 1494, "laps": 0, "lap_times": []}]}
+{"steps_done": 96000, "throttle_min": 0.2, "mean_steps": 813.0, "mean_lap_time": null, "per_set": [{"steps": 1671, "laps": 0, "lap_times": []}, {"steps": 385, "laps": 0, "lap_times": []}, {"steps": 383, "laps": 0, "lap_times": []}]}
+{"steps_done": 102000, "throttle_min": 0.2, "mean_steps": 747.3333333333334, "mean_lap_time": null, "per_set": [{"steps": 715, "laps": 0, "lap_times": []}, {"steps": 932, "laps": 0, "lap_times": []}, {"steps": 595, "laps": 0, "lap_times": []}]}
+{"steps_done": 108000, "throttle_min": 0.2, "mean_steps": 466.0, "mean_lap_time": null, "per_set": [{"steps": 468, "laps": 0, "lap_times": []}, {"steps": 476, "laps": 0, "lap_times": []}, {"steps": 454, "laps": 0, "lap_times": []}]}
+{"steps_done": 114000, "throttle_min": 0.2, "mean_steps": 1169.0, "mean_lap_time": null, "per_set": [{"steps": 1318, "laps": 0, "lap_times": []}, {"steps": 1278, "laps": 0, "lap_times": []}, {"steps": 911, "laps": 0, "lap_times": []}]}
+{"steps_done": 120000, "throttle_min": 0.2, "mean_steps": 1125.0, "mean_lap_time": null, "per_set": [{"steps": 941, "laps": 0, "lap_times": []}, {"steps": 1492, "laps": 0, "lap_times": []}, {"steps": 942, "laps": 0, "lap_times": []}]}
diff --git a/agent/outerloop-results/robust_eval_mountain.jsonl b/agent/outerloop-results/robust_eval_mountain.jsonl
new file mode 100644
index 0000000..705c4a5
--- /dev/null
+++ b/agent/outerloop-results/robust_eval_mountain.jsonl
@@ -0,0 +1,13 @@
+{"set": 1, "episode": 1, "steps": 195, "reward": 313.8098858782323, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:55:01.852800"}
+{"set": 1, "episode": 2, "steps": 907, "reward": 821.3252189619088, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:55:15.580688"}
+{"set": 1, "episode": 3, "steps": 187, "reward": 312.3699834933941, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:55:20.305057"}
+{"set_summary": {"set": 1, "mean_steps": 429.6666666666667, "mean_reward": 482.50169611117843}}
+{"set": 2, "episode": 1, "steps": 1684, "reward": 2886.7210297683996, "laps": 2, "lap_times": [30.796875, 27.3125], "timestamp": "2026-04-19T23:55:43.831212"}
+{"set": 2, "episode": 2, "steps": 1791, "reward": 2724.1041878786637, "laps": 2, "lap_times": [29.234375, 31.578125], "timestamp": "2026-04-19T23:56:08.736059"}
+{"set": 2, "episode": 3, "steps": 2000, "reward": 3338.140802157104, "laps": 3, "lap_times": [29.828125, 27.828125, 29.171875], "timestamp": "2026-04-19T23:56:34.963968"}
+{"set_summary": {"set": 2, "mean_steps": 1825.0, "mean_reward": 2982.9886732680557}}
+{"set": 3, "episode": 1, "steps": 189, "reward": 304.40264326371107, "laps": 0, "lap_times": [], "timestamp": "2026-04-19T23:56:39.723007"}
+{"set": 3, "episode": 2, "steps": 2000, "reward": 3396.2255747133167, "laps": 3, "lap_times": [29.875, 28.75, 27.765625], "timestamp": "2026-04-19T23:57:05.989723"}
+{"set": 3, "episode": 3, "steps": 773, "reward": 1300.720640436186, "laps": 1, "lap_times": [31.265625], "timestamp": "2026-04-19T23:57:18.198014"}
+{"set_summary": {"set": 3, "mean_steps": 987.3333333333334, "mean_reward": 1667.116286137738}}
+{"overall": {"mean_steps_across_sets": 1080.6666666666667, "mean_reward_across_sets": 1710.8688851723239}}
diff --git a/docs/STATE.md b/docs/STATE.md
index 8b52739..c9bd478 100644
--- a/docs/STATE.md
+++ b/docs/STATE.md
@@ -1,100 +1,83 @@
-# Project State — April 16, 2026 (post-testing)
+# Project State — April 27, 2026
 
 ## The Goal
+
 Train a DonkeyCar model that generalises to any road-surface track
 (outdoor, asphalt, lane markings) — demonstrated by driving a
 never-seen track without crashing.
 
 ---
 
-## Confirmed Working Models (tested today, observed by user)
+## Current Champion Models
 
-### ✅ Phase 2 Champion — generated_road
-- **Path:** `models/champion/model.zip`
-- **Trained on:** generated_road only, ~13k steps, lr=0.000225
-- **Test result:** Drove full 2000 steps, 2013 reward. User: "driving very well, stayed in right-hand lane, very very good"
-- **Other tracks:** Confirmed fails on generated_track (old multitrack_eval)
+### ✅ exp13-gentrack-v4 — generated_track specialist
+- **Path:** `models/exp13-gentrack-v4/best_model.zip`
+- **Trained on:** generated_track only, ~30k steps (stopped early), lr=0.000725, throttle_min=0.2
+- **Reward:** v4 (base × efficiency × speed_bonus)
+- **Performance:** Drives generated_track reliably, clean laps
+- **Zero-shot:** Fails on mountain_track (expected — single-track specialist)
 
-### ✅ Wave 4 Trial 9 — generated_track AND mini_monaco
+### ✅ exp14-mountain-v5-finetune ft_036k — mountain specialist
+- **Path:** `models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
+- **Trained on:** mountain_track, fine-tuned from exp14 base, checkpoint at 36k steps
+- **Reward:** v5 (speed × CTE-quality), throttle_floor=0.2 (switched from 0.4 at 30k)
+- **Performance:** 9/9 successful episodes, 25 total laps, mean lap 27.93s, best lap 26.16s
+- **Zero-shot:** Fails on generated_track (expected — single-track specialist)
+
+### ⭐ Wave 4 Trial 9 — best generalising model (but not reproducible)
 - **Path:** `models/wave4-trial-0009/model.zip`
-- **Trained on:** generated_track + mountain_track from scratch, ~90k steps, lr=0.000725, switch=6,851
-- **Test on generated_track:** 3/3 episodes drove full 2000 steps, 13–16 second genuine laps
-- **Test on mini_monaco:** Full 2000 steps, 40-second genuine laps (zero-shot — never seen during training)
-- **This is our best model**
-
-### ✅ Wave 4 Trial 19 — generated_track (mostly)
-- **Path:** `models/wave4-trial-0019/model.zip`
-- **Trained on:** generated_track + mountain_track from scratch, ~74k steps, lr=0.000629, switch=8,211
-- **Test on generated_track:** 2/3 episodes drove full 2000 steps, 14–17 second genuine laps. 1 crash.
-- **mini_monaco score during training:** 231 (best "honest" result from Wave 4)
+- **Trained on:** generated_track + mountain_track, ~90k steps, lr=0.000725, switch=6,851
+- **Performance:** generated_track 2000/2000, mini_monaco 2000/2000 (zero-shot)
+- **Problem:** Same hyperparameters repeated multiple times → all failed. This was a lucky random seed.
 
 ---
 
-## Key Finding: Generated Track Lighting Variation
-The generated_track changes lighting conditions (sun angle, shadows) on every
-env.reset() due to procedural generation. This means during training, every
-episode showed a different visual appearance of the same track. The model was
-forced to learn track-geometry features (road edges, markings) rather than
-lighting-specific patterns. This visual robustness is almost certainly why
-Trial 9 can zero-shot generalise to mini_monaco.
+## What We Know (cumulative)
+
+### Reward functions
+- **v4** (base × efficiency × speed_bonus): works for generated_track; gives zero gradient on mountain hills
+- **v5** (speed × CTE-quality): works for mountain; circular driving exploit possible on flat track
+- **v6** (v5 + efficiency gate ≥ 0.15): prevents circular exploit; may suppress early exploration
+
+### Training approaches tried and their outcomes
+| Approach | Result |
+|---|---|
+| Single-track PPO (Exp 9, 13) | ✅ Reliable. Best per-track performance. |
+| Round-robin close-and-switch (Wave 4, Exp 10) | ❌ 80% failure rate. Disrupts PPO rollout buffer. |
+| Parallel DummyVecEnv 90k steps (Exp 11b) | ⚠️ Infrastructure works; 90k too few steps (194 steps on all tracks). |
+| Cross-track warm start both directions (Exp 15, 16) | ❌ Both failed. Single-track policies too specialised for naive transfer. |
+
+### Mountain track physics (fixed 2026-04-27)
+The mountain_track.unity scene assigned Slippery physics material (staticFriction=0.1)
+to 4 track surface colliders. WheelPhys.cs scales wheel grip by surface staticFriction,
+so the car had 1/5 normal grip on the hill. This caused visible wheelspin.
+Fixed by assigning Road material (staticFriction=0.5) to those 4 colliders in
+`sdsim/Assets/Scenes/mountain_track.unity`. The project uses a pre-built Windows
+executable (DonkeySimWin/donkey_sim.exe), so this fix is deferred until the sim
+is rebuilt from source in Unity Editor. Proceed with Exp 17 using the existing binary.
+
+### Key parameter knowledge
+- **lr:** 0.000725 (from Trial 9 and Exp 9 — consistent with good results)
+- **throttle_min:** 0.2 (v5/v6 reward gives non-zero gradient on hills even at 0.2)
+- **n_steer/n_throttle:** Relevant for discrete action space only (PPO uses continuous)
+- **Per-env throttle_min in DummyVecEnv:** Feasible — each env wrapped independently
 
 ---
 
-## Full Test Results — April 16
+## Open Strategy (as of April 27)
 
-| Test | Model | Track | Laps | Steps | Verdict |
-|---|---|---|---|---|---|
-| 1 | Phase 2 champion | generated_road | n/a (not a loop) | 2000/2000 | ✅ DRIVES |
-| 2 | Wave 4 Trial 3 | generated_track | — | — | ❌ MODEL CORRUPTED |
-| 3 | Wave 4 Trial 9 | generated_track | 6 laps × 3 eps | 2000/2000 | ✅ DRIVES |
-| 4 | Wave 4 Trial 9 | mini_monaco | 2 laps per ep | 2000/2000 | ✅ DRIVES (zero-shot) |
-| 5 | Wave 4 Trial 14 | mini_monaco | 1 lap ep2 only | 257/901/253 | ⚠️ INCONSISTENT |
-| 6 | Wave 4 Trial 25 | mini_monaco | 0 | ~147/eps | ❌ CRASHES |
-| + | Wave 4 Trial 19 | generated_track | 5-6 laps × 2 eps | crash/2000/2000 | ✅ MOSTLY |
-| + | Wave 4 Trial 22 | generated_track | 0 | ~110/eps | ❌ SAME SPOT |
-| + | Wave 4 Trial 2 | generated_track | 0 | ~76/eps | ❌ CRASHES |
-| + | Trial 3 (recovered) | generated_track | 0 | ~104/eps | ❌ CRASHES |
+The goal is reliable multi-track generalisation. The validated path forward:
 
----
+1. **Exp 17:** Parallel DummyVecEnv with 400k–500k steps
+   - Two sim instances: generated_track:9091, mountain_track:9093
+   - v6 reward on both (efficiency gate + CTE patience terminator)
+   - throttle_min=0.2 both envs (or optionally 0.5 on mountain, 0.2 on generated)
+   - lr=0.000725, checkpoint every 20k, best_model tracked throughout
+   - Eval mini_monaco zero-shot at every checkpoint
+3. **If Exp 17 plateaus:** Try curriculum (generated_track only for 150k, then add mountain)
+4. **If still stuck:** Tune v6 efficiency gate threshold (check % steps gated in early training)
 
-## What We Know Now
-
-1. **Trial 9 is a genuine multi-track model.** It drives generated_track
-   consistently (3/3) with clean laps, AND generalises zero-shot to
-   mini_monaco (never seen in training). This is real progress.
-
-2. **The "amazing" overnight model (Trial 3) is lost.** The model.zip has
-   a corrupted optimizer file. Policy weights were recovered but the model
-   crashes at ~104 steps — the "amazing" driving was at an intermediate
-   training checkpoint, not the final saved model.
-
-3. **Most Wave 4 high scores were not exploits — they were real.**
-   Trials 5, 6, and 14 showed inconsistent results (crash some episodes,
-   complete lap on others). The model was genuinely learning but unreliably.
-   Only Trial 14 and 25's original very high scores (1573, 1543) appear
-   to have been exploits in the original training eval.
-
-4. **Lighting variation on generated_track is a feature, not a bug.**
-   Procedural generation changes sun angle / shadows each episode, forcing
-   the model to learn geometry rather than appearance. This may be the key
-   to Trial 9's generalisation ability.
-
-5. **Mountain_track training — unknown contribution.** We don't know if
-   mountain_track training helped or hurt. Trial 9 drives generated_track
-   and mini_monaco; whether it can drive mountain_track is untested.
-
----
-
-## Open Questions for Strategy Discussion
-
-1. Can Trial 9 also drive mountain_track? (untested)
-2. Can Trial 9 drive generated_road? (untested — zero-shot to Phase 2 training track)
-3. Why does Trial 9 drive mini_monaco but other models with similar
-   mini_monaco scores (Trial 14: 193, Trial 22: 193) don't reliably?
-4. Would more training steps from Trial 9's hyperparameters produce
-   an even better model?
-5. Is mountain_track necessary, or could we get Trial 9's results
-   training on generated_track alone?
+See `docs/TEST_HISTORY.md` for full Exp 17 design.
 
 ---
 
@@ -102,9 +85,9 @@ Trial 9 can zero-shot generalise to mini_monaco.
 
 | Model | Path | Status |
 |---|---|---|
-| Phase 2 champion | models/champion/model.zip | ✅ Good |
-| Wave 4 Trial 9 | models/wave4-trial-0009/model.zip | ✅ Best model |
-| Wave 4 Trial 19 | models/wave4-trial-0019/model.zip | ✅ Good |
-| Wave 4 Trial 3 | models/wave4-trial-0003/model.zip | ❌ Corrupted |
-| Wave 4 Trials 1,2,5-8,10-25 | models/wave4-trial-XXXX/ | Available, mostly crash on generated_track |
-
+| exp13-gentrack-v4 | models/exp13-gentrack-v4/best_model.zip | ✅ Generated_track specialist |
+| exp14-mountain-v5-finetune ft_036k | models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip | ✅ Mountain specialist (best overall mountain model) |
+| exp14-mountain-v5 | models/exp14-mountain-v5/best_model.zip | ✅ Mountain base (good, slightly worse than ft_036k) |
+| Wave 4 Trial 9 | models/wave4-trial-0009/model.zip | ✅ Best generalising model; unreproducible |
+| Phase 2 champion | models/champion/model.zip | ✅ generated_road specialist only |
+| Wave 4 other trials | models/wave4-trial-XXXX/ | Mostly crash on all tracks |
diff --git a/docs/TEST_HISTORY.md b/docs/TEST_HISTORY.md
index c05494f..39ac87d 100644
--- a/docs/TEST_HISTORY.md
+++ b/docs/TEST_HISTORY.md
@@ -508,3 +508,105 @@ For now:
 - keep the single-track champions as separate specialists
 - do **not** assume direct cross-track warm starts are beneficial
 
+---
+
+## Mountain Track Friction Fix (2026-04-27)
+
+### Root cause
+
+`WheelPhys.cs` scales wheel grip by the static friction of whatever surface the
+wheel is touching: `fFriction.stiffness = hit.collider.material.staticFriction * originalForwardStiffness`.
+
+`mountain_track.unity` assigned the Slippery physics material (staticFriction=0.1)
+to 4 track surface colliders from the `long_road` prefab. This gave the car 1/5
+the normal grip on the hill, causing visible wheelspin even at full throttle.
+
+The Slippery material is intentional on genuinely icy surfaces (thunderhill) but
+was incorrect on mountain_track's asphalt hill.
+
+### Fix applied
+
+Replaced all 4 Slippery material assignments with Road material (staticFriction=0.5)
+in `sdsim/Assets/Scenes/mountain_track.unity`.
+
+| Material | staticFriction | GUID |
+|---|---|---|
+| Slippery (removed) | 0.1 | c0e12c099c364af4e9e311a43d0f12c4 |
+| Road (applied) | 0.5 | 7884193b0ead347a38a13a67f294dfb5 |
+
+### To activate
+
+The training setup uses the pre-built Windows executable (`DonkeySimWin/donkey_sim.exe`),
+not a locally-compiled build. The scene file edit in sdsandbox/ has no effect on the
+running binary — it only matters if the sim is ever rebuilt from source in Unity Editor.
+
+**This fix is deferred.** Proceed with Exp 17 using the existing executable.
+If mountain hill training in Exp 17 specifically struggles (short episodes that plateau
+and never improve), that is the signal to pursue a Unity Editor rebuild.
+
+The scene file change is committed in sdsandbox/ and will apply automatically if the
+sim is rebuilt for any other reason. No Python code changes needed.
+
+### Expected effect
+
+- Hill wheelspin should stop or greatly reduce
+- Throttle_min=0.2 + v5 reward should be even more effective on the hill
+- All future mountain experiments benefit; no code changes needed
+
+---
+
+## Strategy Review and Exp 17 Plan (2026-04-27)
+
+### Where the project stands
+
+After 16 experiments and 4 autoresearch phases, the core problem is clear:
+multi-track training is needed for generalisation, but the training method has
+been unreliable. Here is the summary of what each approach found:
+
+| Approach | Outcome |
+|---|---|
+| Round-robin close-and-switch (Wave 4, Exp 10) | 80% failure. PPO rollout buffer disrupted on env swap. Lucky seed (Trial 9) worked once but cannot be reproduced. |
+| Parallel DummyVecEnv 90k steps (Exp 11b) | Infrastructure valid, no catastrophic forgetting, but 90k steps / 2 tracks = ~45k effective per track. Not enough. |
+| Cross-track warm starts (Exp 15, 16) | Both directions failed. Single-track specialists do not transfer cleanly. |
+| Single-track PPO (Exp 9, 13, 14) | Reliable but no generalisation. |
+
+The conclusion: **parallel DummyVecEnv is the right architecture; the only known
+failure mode is training budget**. Exp 11b was mechanically sound but starved of steps.
+
+### Exp 17 — Parallel DummyVecEnv, 400k–500k steps
+
+**This is the primary next experiment.**
+
+| Parameter | Value | Reason |
+|---|---|---|
+| Architecture | DummyVecEnv([generated_track:9091, mountain_track:9093]) | Validated in Exp 11b; no PPO disruption |
+| Total timesteps | 400,000–500,000 | ~200k effective per track; Exp 11b proved 90k insufficient |
+| Reward | v6 on both envs (efficiency gate + CTE patience terminator) | Blocks circular exploit on generated_track; gate threshold may be tuned |
+| throttle_min | 0.2 both envs (or 0.5 mountain, 0.2 generated — see ADR-020) | v5/v6 gradient non-zero on hills at 0.2 |
+| learning_rate | 0.000725 | From Trial 9 and Exp 9 — consistent with best results |
+| Checkpoint | every 20,000 steps + best_model.zip tracked throughout | ADR-017: best model ≠ final model |
+| Eval | mini_monaco zero-shot at every checkpoint | Detect the peak before policy drifts |
+| Warm start | None — train from random weights | ADR-024: cross-track warm starts failed |
+
+**Setup checklist before running:**
+1. Two sim instances running: one on port 9091, one on port 9093
+2. Both on the same track as configured (generated_track and mountain_track)
+3. Rebuild simulator with mountain friction fix active
+4. Verify throughput: run 2-minute timing benchmark, set step cap accordingly (ADR-014)
+
+**Success criterion:** mini_monaco zero-shot score > 500 (at least 25% of a full
+2000-step episode) reliably across 3 evaluation sets, reproducible across 2+ runs.
+
+### Fallback: Curriculum training (if Exp 17 plateaus below 200)
+
+If Exp 17 cannot get past ~200 steps on mini_monaco:
+- Phase A: generated_track only, 150k steps (establish road-following)
+- Phase B: add mountain_track to DummyVecEnv, continue 250k more steps
+- Rationale: gives the policy a foundation before the harder mountain physics
+
+### Fallback: v6 efficiency gate tuning (if gate is too aggressive)
+
+Log what fraction of steps are gated (reward zeroed) in the first 100k steps.
+If >40%, lower the gate threshold from 0.15 to 0.10 for the first 150k steps,
+then raise it back to 0.15. Prevents the gate from suppressing early exploration.
+