docs: capture robust mountain finetune winner at 36k and preserve eval comparison

This commit is contained in:
Paul Huliganga 2026-04-20 00:43:27 -04:00
parent 2b90de2fba
commit 0da04327ef
7 changed files with 407 additions and 1 deletions

View File

@ -480,3 +480,70 @@ The car exits through the gap, CTE quickly exceeds 16m, hits `pass` — episode
- Resets counter when car returns to within 4m (brief excursions allowed) - Resets counter when car returns to within 4m (brief excursions allowed)
**Note:** We cannot fix the Unity sim code directly. **Note:** We cannot fix the Unity sim code directly.
---
## ADR-022: Promote Mountain Finetune Checkpoint by Robustness, Not Raw Lap Time
**Date:** 2026-04-19
**Status:** Accepted
**Context:** The mountain finetune (`exp14_finetune_v5.py`) produced several early
checkpoints with very fast lap times under a temporary `throttle_floor=0.4`, but
those checkpoints were extremely fragile in repeated deterministic evaluation.
Later checkpoints degraded badly and often failed to complete any laps.
**Decision:** Select the best mountain finetune checkpoint using a combined
speed + robustness criterion, not raw best lap alone.
**Chosen model:**
- canonical checkpoint: `agent/models/exp14-mountain-v5-finetune/checkpoint_0036000.zip`
- promoted copy: `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
**Why:** In a fresh 9-episode deterministic mountain evaluation, the 36k
checkpoint achieved:
- 9/9 successful episodes
- 25 total laps
- mean lap 27.93s
- best lap 26.16s
This beat:
- the original exp14 mountain champion in robustness
- the 6k/24k/30k finetune checkpoints in robustness
- the later 42k/48k+ checkpoints in both robustness and stability
**Consequence:** Future mountain work should warm-start from the promoted 36k
robust checkpoint, not from the final finetune model and not from the fastest
but fragile 0.4-floor checkpoints.
---
## ADR-023: Mountain Track Likely Has a Real Unity Physics / Traction Problem
**Date:** 2026-04-19
**Status:** Accepted
**Context:** The user observed that on mountain hills, the DonkeyCar often tries
to move forward while the wheels visibly spin and the car barely advances.
This looked like traction loss rather than a pure policy error.
**Evidence from Unity source:**
- Unity repo path: `/mnt/c/Users/Paul/Documents/projects/sdsandbox`
- `sdsim/Assets/Scripts/WheelPhys.cs` scales wheel friction by the hit collider's
physics material static friction:
- `hit.collider.material.staticFriction * originalForwardStiffness`
- `sdsim/Assets/Scenes/mountain_track.unity` contains 4 explicit `Slippery`
material assignments on colliders from the imported `long_road` FBX instance
- physics material values:
- `Slippery.staticFriction = 0.1`
- `Road.staticFriction = 0.5`
- `Grippy.staticFriction = 0.66`
**Decision:** Treat mountain wheelspin / poor uphill progress as a likely real
sim-physics issue, not just an RL reward or hyperparameter issue.
**Consequence:** Before trusting further mountain finetuning results, investigate
and likely patch the Unity scene on branch:
- `investigate-mountain-friction`
This should be prioritized over adding more reward heuristics.

View File

@ -0,0 +1,63 @@
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 1, "steps": 814, "laps": 1, "lap_times": [31.09375], "reward": 132.39361070096493}
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 2, "steps": 2000, "laps": 3, "lap_times": [29.046875, 28.390625, 28.71875], "reward": 345.59998378881755}
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 3, "steps": 1859, "laps": 3, "lap_times": [30.109375, 27.875, 27.015625], "reward": 323.1350573108066}
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 4, "steps": 2000, "laps": 3, "lap_times": [30.484375, 27.953125, 28.890625], "reward": 338.7155503882095}
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 5, "steps": 190, "laps": 0, "lap_times": [], "reward": 34.7614578341836}
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 6, "steps": 689, "laps": 0, "lap_times": [], "reward": 89.68411724013276}
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 7, "steps": 1164, "laps": 1, "lap_times": [29.34375], "reward": 193.3484125709192}
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 8, "steps": 2000, "laps": 3, "lap_times": [29.125, 36.109375, 28.0], "reward": 325.9044730809983}
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 9, "steps": 1275, "laps": 2, "lap_times": [28.484375, 27.140625], "reward": 230.20822774680255}
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 1, "steps": 177, "laps": 0, "lap_times": [], "reward": 36.94930865149945}
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 2, "steps": 126, "laps": 0, "lap_times": [], "reward": 27.089253417694636}
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 3, "steps": 161, "laps": 0, "lap_times": [], "reward": 35.617936324328184}
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 4, "steps": 372, "laps": 0, "lap_times": [], "reward": 81.32034226437099}
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 5, "steps": 348, "laps": 0, "lap_times": [], "reward": 80.90182103098687}
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 6, "steps": 351, "laps": 0, "lap_times": [], "reward": 83.38630702279897}
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 7, "steps": 769, "laps": 1, "lap_times": [21.359375], "reward": 181.13429199228995}
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 8, "steps": 356, "laps": 0, "lap_times": [], "reward": 82.55838803868119}
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 9, "steps": 357, "laps": 0, "lap_times": [], "reward": 83.45234980319492}
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 1, "steps": 395, "laps": 0, "lap_times": [], "reward": 92.56352121913824}
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 2, "steps": 813, "laps": 1, "lap_times": [22.5625], "reward": 174.79000809043646}
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 3, "steps": 776, "laps": 1, "lap_times": [21.9375], "reward": 168.13511967613886}
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 4, "steps": 1248, "laps": 2, "lap_times": [20.53125, 21.21875], "reward": 268.3056740048578}
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 5, "steps": 382, "laps": 0, "lap_times": [], "reward": 80.95213605418394}
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 6, "steps": 555, "laps": 1, "lap_times": [21.640625], "reward": 127.2610695830399}
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 7, "steps": 149, "laps": 0, "lap_times": [], "reward": 29.529473824529305}
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 8, "steps": 438, "laps": 0, "lap_times": [], "reward": 91.48110114145402}
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 9, "steps": 416, "laps": 0, "lap_times": [], "reward": 91.06411632476375}
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 1, "steps": 72, "laps": 0, "lap_times": [], "reward": 10.501809980045437}
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 2, "steps": 451, "laps": 0, "lap_times": [], "reward": 87.36528500499116}
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 3, "steps": 1012, "laps": 2, "lap_times": [22.34375, 20.71875], "reward": 224.14087196643231}
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 4, "steps": 157, "laps": 0, "lap_times": [], "reward": 36.810340652579725}
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 5, "steps": 167, "laps": 0, "lap_times": [], "reward": 38.044355134905345}
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 6, "steps": 167, "laps": 0, "lap_times": [], "reward": 34.502623422992656}
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 7, "steps": 314, "laps": 0, "lap_times": [], "reward": 64.56299563837547}
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 8, "steps": 183, "laps": 0, "lap_times": [], "reward": 39.53914854944924}
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 9, "steps": 329, "laps": 0, "lap_times": [], "reward": 65.17448510392569}
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 1, "steps": 811, "laps": 1, "lap_times": [29.921875], "reward": 134.79952276336735}
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 2, "steps": 2000, "laps": 3, "lap_times": [27.890625, 26.390625, 26.78125], "reward": 355.31346766664956}
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 3, "steps": 2000, "laps": 3, "lap_times": [28.765625, 27.765625, 27.46875], "reward": 344.61911652631443}
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 4, "steps": 1833, "laps": 3, "lap_times": [27.5625, 27.375, 26.15625], "reward": 330.4355698036379}
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 5, "steps": 1928, "laps": 3, "lap_times": [29.125, 28.109375, 27.6875], "reward": 332.7582264354696}
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 6, "steps": 2000, "laps": 3, "lap_times": [29.46875, 26.8125, 28.5], "reward": 342.77918509633764}
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 7, "steps": 2000, "laps": 3, "lap_times": [28.515625, 26.4375, 28.046875], "reward": 356.1632144622272}
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 8, "steps": 2000, "laps": 3, "lap_times": [29.4375, 28.140625, 26.546875], "reward": 351.6208579884842}
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 9, "steps": 2000, "laps": 3, "lap_times": [29.484375, 27.078125, 28.71875], "reward": 346.25121597342695}
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 1, "steps": 727, "laps": 1, "lap_times": [27.65625], "reward": 127.74538866006424}
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 2, "steps": 1082, "laps": 1, "lap_times": [29.53125], "reward": 180.6854192542378}
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 3, "steps": 2000, "laps": 3, "lap_times": [28.796875, 27.171875, 31.359375], "reward": 316.97187187187683}
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 4, "steps": 436, "laps": 0, "lap_times": [], "reward": 76.15429569082335}
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 5, "steps": 1255, "laps": 2, "lap_times": [29.375, 27.84375], "reward": 214.58350544204313}
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 6, "steps": 2000, "laps": 3, "lap_times": [29.265625, 27.09375, 28.046875], "reward": 346.9328897984469}
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 7, "steps": 2000, "laps": 3, "lap_times": [29.609375, 28.25, 29.0625], "reward": 325.0631527120613}
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 8, "steps": 1139, "laps": 1, "lap_times": [36.640625], "reward": 174.01025118848793}
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 9, "steps": 2000, "laps": 3, "lap_times": [28.421875, 31.546875, 27.59375], "reward": 317.8546572496798}
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 1, "steps": 2000, "laps": 2, "lap_times": [41.96875, 29.703125], "reward": 295.66797001392115}
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 2, "steps": 209, "laps": 0, "lap_times": [], "reward": 39.03441974380985}
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 3, "steps": 1353, "laps": 2, "lap_times": [31.84375, 28.46875], "reward": 227.3901946817232}
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 4, "steps": 2000, "laps": 3, "lap_times": [30.5, 31.171875, 29.15625], "reward": 332.7062042630514}
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 5, "steps": 201, "laps": 0, "lap_times": [], "reward": 38.25372101787252}
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 6, "steps": 2000, "laps": 3, "lap_times": [29.875, 28.90625, 28.3125], "reward": 332.71756463419115}
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 7, "steps": 1365, "laps": 2, "lap_times": [30.859375, 29.921875], "reward": 235.64415748882038}
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 8, "steps": 184, "laps": 0, "lap_times": [], "reward": 32.21486316375649}
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 9, "steps": 830, "laps": 1, "lap_times": [34.25], "reward": 131.9192705488067}

View File

@ -0,0 +1,29 @@
# Mountain candidate checkpoint evaluation — 2026-04-19
Deterministic eval on `mountain_track`, 9 episodes per model, max 2000 steps/episode.
| model | floor | success eps | full 2k eps | avg laps/ep | total laps | mean lap (all) | best lap | avg steps | notes |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---|
| exp14_base | 0.2 | 7/9 | 3/9 | 1.78 | 16 | 29.24 | 27.02 | 1332 | original champion |
| ft_006k | 0.4 | 1/9 | 0/9 | 0.11 | 1 | 21.36 | 21.36 | 335 | very fast when it works, extremely fragile |
| ft_024k | 0.4 | 4/9 | 0/9 | 0.56 | 5 | 21.58 | 20.53 | 575 | fast but fragile |
| ft_030k | 0.4 | 1/9 | 0/9 | 0.22 | 2 | 21.53 | 20.72 | 317 | very fast but extremely fragile |
| ft_036k | 0.2 | 9/9 | 6/9 | 2.78 | 25 | 27.93 | 26.16 | 1841 | best balance: fastest robust candidate |
| ft_042k | 0.2 | 8/9 | 4/9 | 1.89 | 17 | 29.25 | 27.09 | 1404 | decent, but worse than 36k |
| ft_048k | 0.2 | 6/9 | 3/9 | 1.44 | 13 | 31.15 | 28.31 | 1127 | degraded |
## Recommendation
Best overall candidate:
- `models/exp14-mountain-v5-finetune/checkpoint_0036000.zip`
Reason:
- 9/9 successful episodes
- 25 total laps across 9 episodes
- mean lap 27.93s
- best lap 26.16s
- clearly more robust than the original exp14 champion and later finetune checkpoints
## Raw result file
- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.jsonl`

View File

@ -0,0 +1,149 @@
import os, json, time
from datetime import datetime
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage
import gymnasium as gym
import numpy as np
from donkeycar_sb3_runner import ThrottleClampWrapper
HOST='10.0.0.55'
PORT=9091
TRACK_ID='donkey-mountain-track-v0'
MAX_STEPS=2000
EPISODES=9
OUT='outerloop-results/mountain_candidate_eval_2026-04-19.jsonl'
CANDIDATES = [
('exp14_base', 'models/exp14-mountain-v5/best_model.zip', 0.2),
('ft_006k', 'models/exp14-mountain-v5-finetune/checkpoint_0006000.zip', 0.4),
('ft_024k', 'models/exp14-mountain-v5-finetune/checkpoint_0024000.zip', 0.4),
('ft_030k', 'models/exp14-mountain-v5-finetune/checkpoint_0030000.zip', 0.4),
('ft_036k', 'models/exp14-mountain-v5-finetune/checkpoint_0036000.zip', 0.2),
('ft_042k', 'models/exp14-mountain-v5-finetune/checkpoint_0042000.zip', 0.2),
('ft_048k', 'models/exp14-mountain-v5-finetune/checkpoint_0048000.zip', 0.2),
]
class V5RewardWrapper(gym.Wrapper):
def __init__(self, env, max_cte=8.0, min_lap_time=5.0):
super().__init__(env)
self.max_cte = max_cte
self.min_lap_time = min_lap_time
self._last_lc = 0
def reset(self, **kwargs):
self._last_lc = 0
return self.env.reset(**kwargs)
def step(self, action):
result = self.env.step(action)
if len(result) == 5:
obs, _sim, terminated, truncated, info = result
done = terminated or truncated
else:
obs, _sim, done, info = result
terminated, truncated = done, False
try:
cte = float(info.get('cte', 0.0) or 0.0)
except Exception:
cte = 0.0
cte_quality = 1.0 - min(abs(cte) / self.max_cte, 1.0)
try:
speed = max(0.0, float(info.get('speed', 0.0) or 0.0))
except Exception:
speed = 0.0
speed_norm = min(speed / 10.0, 1.0)
reward = cte_quality * speed_norm
try:
current_lc = int(info.get('lap_count', 0) or 0)
except Exception:
current_lc = self._last_lc
force_terminate = False
if current_lc > self._last_lc:
self._last_lc = current_lc
try:
lap_time = float(info.get('last_lap_time', 999.0) or 999.0)
except Exception:
lap_time = 999.0
if lap_time < self.min_lap_time:
reward = -10.0 * (self.min_lap_time / max(lap_time, 0.1))
force_terminate = True
if len(result) == 5:
return obs, reward, terminated or force_terminate, truncated, info
return obs, reward, terminated or force_terminate, info
def make_env(base_throttle=0.2, throttle_floor=None):
def _init():
raw = gym.make(TRACK_ID, conf={'host': HOST, 'port': PORT})
env = ThrottleClampWrapper(raw, throttle_min=base_throttle)
if throttle_floor is not None:
class ThrottleFloorWrapper(gym.Wrapper):
def __init__(self, env, floor):
super().__init__(env)
self.floor = floor
def step(self, action):
act = np.array(action)
try:
act[1] = max(act[1], self.floor)
except Exception:
pass
return self.env.step(act)
def reset(self, **kwargs):
return self.env.reset(**kwargs)
env = ThrottleFloorWrapper(env, throttle_floor)
env = V5RewardWrapper(env)
return env
return _init
os.makedirs(os.path.dirname(OUT), exist_ok=True)
all_rows = []
for label, model_path, floor in CANDIDATES:
print(f'\n=== Evaluating {label} floor={floor} path={model_path}', flush=True)
env = VecTransposeImage(DummyVecEnv([make_env(0.2, floor)]))
model = PPO.load(model_path, device='cpu')
model.set_env(env)
episodes = []
for ep in range(EPISODES):
obs = env.reset()
steps = 0
laps = 0
prev_lc = 0
lap_times = []
total_reward = 0.0
while steps < MAX_STEPS:
action, _ = model.predict(obs, deterministic=True)
obs, r, d, info = env.step(action)
inf = info[0] if isinstance(info, (list, tuple)) else info
total_reward += float(r[0])
steps += 1
lc = int(inf.get('lap_count', 0) or 0)
if lc > prev_lc:
try:
lap_times.append(float(inf.get('last_lap_time', 0) or 0))
except Exception:
lap_times.append(0.0)
prev_lc = lc
laps = lc
if bool(d[0]):
break
row = {
'label': label,
'model_path': model_path,
'throttle_floor': floor,
'episode': ep + 1,
'steps': steps,
'laps': laps,
'lap_times': lap_times,
'reward': total_reward,
}
episodes.append(row)
all_rows.append(row)
print(f" ep{ep+1}: steps={steps} laps={laps} lap_times={lap_times}", flush=True)
env.close()
time.sleep(2)
with open(OUT, 'w') as f:
for row in all_rows:
f.write(json.dumps(row) + '\n')
print('\nSaved to', OUT)

View File

@ -120,6 +120,59 @@ parallel envs are working.
- Exp 11c (v6 reward, 250k): aborted — grass exploit found on generated_track - Exp 11c (v6 reward, 250k): aborted — grass exploit found on generated_track
- Exp 11d: pending fixes before re-run - Exp 11d: pending fixes before re-run
## Mountain Track Finetune + Physics Investigation (2026-04-19 late session)
### Finetune outcome summary
- Created and ran `agent/experiments/exp14_finetune_v5.py`
- Warm-start source: `agent/models/exp14-mountain-v5/best_model.zip`
- Schedule used:
- phase 1: runtime throttle floor `0.4`
- phase 2: runtime throttle floor `0.2`
- Training later degraded badly; later checkpoints became poor / unstable
- Best usable finetune checkpoint was **not** the final model
### Robust checkpoint comparison on mountain_track
We ran a deterministic mountain-only comparison over 9 episodes per candidate.
Results saved to:
- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.jsonl`
- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.md`
Winner:
- `agent/models/exp14-mountain-v5-finetune/checkpoint_0036000.zip`
- promoted copy:
- `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
Key result:
- **ft_036k** achieved:
- 9/9 successful episodes
- 25 total laps across 9 episodes
- mean lap **27.93s**
- best lap **26.16s**
- This beat:
- original mountain champion for robustness
- earlier `0.4`-floor checkpoints for robustness
- later finetune checkpoints, which had degraded badly
### Mountain physics discovery in Unity sim
Unity source path confirmed:
- `/mnt/c/Users/Paul/Documents/projects/sdsandbox`
We found a likely real root cause for hill wheelspin:
- `sdsim/Assets/Scripts/WheelPhys.cs` scales wheel friction by the hit collider's physics material:
- `hit.collider.material.staticFriction * originalForwardStiffness`
- `mountain_track.unity` contains **4 explicit `Slippery` physics-material assignments** on the imported `long_road` FBX instance
- `Slippery.staticFriction = 0.1`
- `Road.staticFriction = 0.5`
- `Grippy.staticFriction = 0.66`
Interpretation:
- mountain road traction is likely much lower than normal road tracks
- this matches observed wheelspin / poor uphill progress / getting stuck on hills
We created a dedicated Unity investigation branch before changing anything:
- repo: `/mnt/c/Users/Paul/Documents/projects/sdsandbox`
- branch: `investigate-mountain-friction`
## Critical Known Facts (DO NOT LOSE) ## Critical Known Facts (DO NOT LOSE)
### throttle_min history (from Exp 1-9) ### throttle_min history (from Exp 1-9)

View File

@ -1,6 +1,6 @@
# Test History — DonkeyCar RL Autoresearch # Test History — DonkeyCar RL Autoresearch
Last updated: 2026-04-18 Last updated: 2026-04-19
This document records every significant training experiment, what was This document records every significant training experiment, what was
changed, what was observed, and what was learned. Use this to make changed, what was observed, and what was learned. Use this to make
@ -400,3 +400,48 @@ camera angle) to achieve generalization instead?
- Try v5 reward with parallel envs but longer training (accept some circling) - Try v5 reward with parallel envs but longer training (accept some circling)
- Check if efficiency gate triggers too aggressively during normal cornering - Check if efficiency gate triggers too aggressively during normal cornering
---
## Exp 14b — Mountain finetune from exp14 champion (2026-04-19)
- **Script:** `agent/experiments/exp14_finetune_v5.py`
- **Warm start:** `agent/models/exp14-mountain-v5/best_model.zip`
- **Schedule:**
- phase 1: runtime throttle floor `0.4`
- phase 2: runtime throttle floor `0.2`
- **Goal:** improve hill climbing, robustness, and lap time on `mountain_track`
### Important outcome
The finetune run **did not improve monotonically**. It briefly improved, then later degraded badly.
This means the final/latest checkpoint is **not** the model we want to keep.
### Candidate checkpoint comparison
We ran a fresh deterministic comparison on mountain only:
- **9 episodes per model**
- **2000 step cap**
- Results saved to:
- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.jsonl`
- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.md`
| Model | Floor | Success eps | Full 2k eps | Avg laps/ep | Total laps | Mean lap | Best lap | Avg steps | Verdict |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---|
| exp14_base | 0.2 | 7/9 | 3/9 | 1.78 | 16 | 29.24s | 27.02s | 1332 | Original champion |
| ft_006k | 0.4 | 1/9 | 0/9 | 0.11 | 1 | 21.36s | 21.36s | 335 | Very fast but unusably fragile |
| ft_024k | 0.4 | 4/9 | 0/9 | 0.56 | 5 | 21.58s | 20.53s | 575 | Fast but fragile |
| ft_030k | 0.4 | 1/9 | 0/9 | 0.22 | 2 | 21.53s | 20.72s | 317 | Very fast but unusably fragile |
| **ft_036k** | **0.2** | **9/9** | **6/9** | **2.78** | **25** | **27.93s** | **26.16s** | **1841** | **Best overall balance** |
| ft_042k | 0.2 | 8/9 | 4/9 | 1.89 | 17 | 29.25s | 27.09s | 1404 | Decent, but worse than 36k |
| ft_048k | 0.2 | 6/9 | 3/9 | 1.44 | 13 | 31.15s | 28.31s | 1127 | Degraded |
### Best model captured
Best overall checkpoint from the finetune:
- `agent/models/exp14-mountain-v5-finetune/checkpoint_0036000.zip`
Promoted copy saved as:
- `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
### Key learning
- Early `0.4`-floor checkpoints can produce very fast laps, but are too fragile to trust.
- The best mountain finetune model is the **36k checkpoint after switching back to 0.2 floor**, not the later checkpoints.
- Later finetune checkpoints collapsed badly, matching the user's visual observation of wheelspin / poor driving.