diff --git a/DECISIONS.md b/DECISIONS.md index 4a19273..6712aa0 100644 --- a/DECISIONS.md +++ b/DECISIONS.md @@ -480,3 +480,70 @@ The car exits through the gap, CTE quickly exceeds 16m, hits `pass` — episode - Resets counter when car returns to within 4m (brief excursions allowed) **Note:** We cannot fix the Unity sim code directly. + +--- + +## ADR-022: Promote Mountain Finetune Checkpoint by Robustness, Not Raw Lap Time + +**Date:** 2026-04-19 +**Status:** Accepted + +**Context:** The mountain finetune (`exp14_finetune_v5.py`) produced several early +checkpoints with very fast lap times under a temporary `throttle_floor=0.4`, but +those checkpoints were extremely fragile in repeated deterministic evaluation. +Later checkpoints degraded badly and often failed to complete any laps. + +**Decision:** Select the best mountain finetune checkpoint using a combined +speed + robustness criterion, not raw best lap alone. + +**Chosen model:** +- canonical checkpoint: `agent/models/exp14-mountain-v5-finetune/checkpoint_0036000.zip` +- promoted copy: `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip` + +**Why:** In a fresh 9-episode deterministic mountain evaluation, the 36k +checkpoint achieved: +- 9/9 successful episodes +- 25 total laps +- mean lap 27.93s +- best lap 26.16s + +This beat: +- the original exp14 mountain champion in robustness +- the 6k/24k/30k finetune checkpoints in robustness +- the later 42k/48k+ checkpoints in both robustness and stability + +**Consequence:** Future mountain work should warm-start from the promoted 36k +robust checkpoint, not from the final finetune model and not from the fastest +but fragile 0.4-floor checkpoints. + +--- + +## ADR-023: Mountain Track Likely Has a Real Unity Physics / Traction Problem + +**Date:** 2026-04-19 +**Status:** Accepted + +**Context:** The user observed that on mountain hills, the DonkeyCar often tries +to move forward while the wheels visibly spin and the car barely advances. +This looked like traction loss rather than a pure policy error. + +**Evidence from Unity source:** +- Unity repo path: `/mnt/c/Users/Paul/Documents/projects/sdsandbox` +- `sdsim/Assets/Scripts/WheelPhys.cs` scales wheel friction by the hit collider's + physics material static friction: + - `hit.collider.material.staticFriction * originalForwardStiffness` +- `sdsim/Assets/Scenes/mountain_track.unity` contains 4 explicit `Slippery` + material assignments on colliders from the imported `long_road` FBX instance +- physics material values: + - `Slippery.staticFriction = 0.1` + - `Road.staticFriction = 0.5` + - `Grippy.staticFriction = 0.66` + +**Decision:** Treat mountain wheelspin / poor uphill progress as a likely real +sim-physics issue, not just an RL reward or hyperparameter issue. + +**Consequence:** Before trusting further mountain finetuning results, investigate +and likely patch the Unity scene on branch: +- `investigate-mountain-friction` + +This should be prioritized over adding more reward heuristics. diff --git a/agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip b/agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip new file mode 100644 index 0000000..48b0d8f Binary files /dev/null and b/agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip differ diff --git a/agent/outerloop-results/mountain_candidate_eval_2026-04-19.jsonl b/agent/outerloop-results/mountain_candidate_eval_2026-04-19.jsonl new file mode 100644 index 0000000..c47e551 --- /dev/null +++ b/agent/outerloop-results/mountain_candidate_eval_2026-04-19.jsonl @@ -0,0 +1,63 @@ +{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 1, "steps": 814, "laps": 1, "lap_times": [31.09375], "reward": 132.39361070096493} +{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 2, "steps": 2000, "laps": 3, "lap_times": [29.046875, 28.390625, 28.71875], "reward": 345.59998378881755} +{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 3, "steps": 1859, "laps": 3, "lap_times": [30.109375, 27.875, 27.015625], "reward": 323.1350573108066} +{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 4, "steps": 2000, "laps": 3, "lap_times": [30.484375, 27.953125, 28.890625], "reward": 338.7155503882095} +{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 5, "steps": 190, "laps": 0, "lap_times": [], "reward": 34.7614578341836} +{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 6, "steps": 689, "laps": 0, "lap_times": [], "reward": 89.68411724013276} +{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 7, "steps": 1164, "laps": 1, "lap_times": [29.34375], "reward": 193.3484125709192} +{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 8, "steps": 2000, "laps": 3, "lap_times": [29.125, 36.109375, 28.0], "reward": 325.9044730809983} +{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 9, "steps": 1275, "laps": 2, "lap_times": [28.484375, 27.140625], "reward": 230.20822774680255} +{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 1, "steps": 177, "laps": 0, "lap_times": [], "reward": 36.94930865149945} +{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 2, "steps": 126, "laps": 0, "lap_times": [], "reward": 27.089253417694636} +{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 3, "steps": 161, "laps": 0, "lap_times": [], "reward": 35.617936324328184} +{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 4, "steps": 372, "laps": 0, "lap_times": [], "reward": 81.32034226437099} +{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 5, "steps": 348, "laps": 0, "lap_times": [], "reward": 80.90182103098687} +{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 6, "steps": 351, "laps": 0, "lap_times": [], "reward": 83.38630702279897} +{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 7, "steps": 769, "laps": 1, "lap_times": [21.359375], "reward": 181.13429199228995} +{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 8, "steps": 356, "laps": 0, "lap_times": [], "reward": 82.55838803868119} +{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 9, "steps": 357, "laps": 0, "lap_times": [], "reward": 83.45234980319492} +{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 1, "steps": 395, "laps": 0, "lap_times": [], "reward": 92.56352121913824} +{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 2, "steps": 813, "laps": 1, "lap_times": [22.5625], "reward": 174.79000809043646} +{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 3, "steps": 776, "laps": 1, "lap_times": [21.9375], "reward": 168.13511967613886} +{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 4, "steps": 1248, "laps": 2, "lap_times": [20.53125, 21.21875], "reward": 268.3056740048578} +{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 5, "steps": 382, "laps": 0, "lap_times": [], "reward": 80.95213605418394} +{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 6, "steps": 555, "laps": 1, "lap_times": [21.640625], "reward": 127.2610695830399} +{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 7, "steps": 149, "laps": 0, "lap_times": [], "reward": 29.529473824529305} +{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 8, "steps": 438, "laps": 0, "lap_times": [], "reward": 91.48110114145402} +{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 9, "steps": 416, "laps": 0, "lap_times": [], "reward": 91.06411632476375} +{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 1, "steps": 72, "laps": 0, "lap_times": [], "reward": 10.501809980045437} +{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 2, "steps": 451, "laps": 0, "lap_times": [], "reward": 87.36528500499116} +{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 3, "steps": 1012, "laps": 2, "lap_times": [22.34375, 20.71875], "reward": 224.14087196643231} +{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 4, "steps": 157, "laps": 0, "lap_times": [], "reward": 36.810340652579725} +{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 5, "steps": 167, "laps": 0, "lap_times": [], "reward": 38.044355134905345} +{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 6, "steps": 167, "laps": 0, "lap_times": [], "reward": 34.502623422992656} +{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 7, "steps": 314, "laps": 0, "lap_times": [], "reward": 64.56299563837547} +{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 8, "steps": 183, "laps": 0, "lap_times": [], "reward": 39.53914854944924} +{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 9, "steps": 329, "laps": 0, "lap_times": [], "reward": 65.17448510392569} +{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 1, "steps": 811, "laps": 1, "lap_times": [29.921875], "reward": 134.79952276336735} +{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 2, "steps": 2000, "laps": 3, "lap_times": [27.890625, 26.390625, 26.78125], "reward": 355.31346766664956} +{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 3, "steps": 2000, "laps": 3, "lap_times": [28.765625, 27.765625, 27.46875], "reward": 344.61911652631443} +{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 4, "steps": 1833, "laps": 3, "lap_times": [27.5625, 27.375, 26.15625], "reward": 330.4355698036379} +{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 5, "steps": 1928, "laps": 3, "lap_times": [29.125, 28.109375, 27.6875], "reward": 332.7582264354696} +{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 6, "steps": 2000, "laps": 3, "lap_times": [29.46875, 26.8125, 28.5], "reward": 342.77918509633764} +{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 7, "steps": 2000, "laps": 3, "lap_times": [28.515625, 26.4375, 28.046875], "reward": 356.1632144622272} +{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 8, "steps": 2000, "laps": 3, "lap_times": [29.4375, 28.140625, 26.546875], "reward": 351.6208579884842} +{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 9, "steps": 2000, "laps": 3, "lap_times": [29.484375, 27.078125, 28.71875], "reward": 346.25121597342695} +{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 1, "steps": 727, "laps": 1, "lap_times": [27.65625], "reward": 127.74538866006424} +{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 2, "steps": 1082, "laps": 1, "lap_times": [29.53125], "reward": 180.6854192542378} +{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 3, "steps": 2000, "laps": 3, "lap_times": [28.796875, 27.171875, 31.359375], "reward": 316.97187187187683} +{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 4, "steps": 436, "laps": 0, "lap_times": [], "reward": 76.15429569082335} +{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 5, "steps": 1255, "laps": 2, "lap_times": [29.375, 27.84375], "reward": 214.58350544204313} +{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 6, "steps": 2000, "laps": 3, "lap_times": [29.265625, 27.09375, 28.046875], "reward": 346.9328897984469} +{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 7, "steps": 2000, "laps": 3, "lap_times": [29.609375, 28.25, 29.0625], "reward": 325.0631527120613} +{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 8, "steps": 1139, "laps": 1, "lap_times": [36.640625], "reward": 174.01025118848793} +{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 9, "steps": 2000, "laps": 3, "lap_times": [28.421875, 31.546875, 27.59375], "reward": 317.8546572496798} +{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 1, "steps": 2000, "laps": 2, "lap_times": [41.96875, 29.703125], "reward": 295.66797001392115} +{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 2, "steps": 209, "laps": 0, "lap_times": [], "reward": 39.03441974380985} +{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 3, "steps": 1353, "laps": 2, "lap_times": [31.84375, 28.46875], "reward": 227.3901946817232} +{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 4, "steps": 2000, "laps": 3, "lap_times": [30.5, 31.171875, 29.15625], "reward": 332.7062042630514} +{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 5, "steps": 201, "laps": 0, "lap_times": [], "reward": 38.25372101787252} +{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 6, "steps": 2000, "laps": 3, "lap_times": [29.875, 28.90625, 28.3125], "reward": 332.71756463419115} +{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 7, "steps": 1365, "laps": 2, "lap_times": [30.859375, 29.921875], "reward": 235.64415748882038} +{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 8, "steps": 184, "laps": 0, "lap_times": [], "reward": 32.21486316375649} +{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 9, "steps": 830, "laps": 1, "lap_times": [34.25], "reward": 131.9192705488067} diff --git a/agent/outerloop-results/mountain_candidate_eval_2026-04-19.md b/agent/outerloop-results/mountain_candidate_eval_2026-04-19.md new file mode 100644 index 0000000..fdecfe6 --- /dev/null +++ b/agent/outerloop-results/mountain_candidate_eval_2026-04-19.md @@ -0,0 +1,29 @@ +# Mountain candidate checkpoint evaluation — 2026-04-19 + +Deterministic eval on `mountain_track`, 9 episodes per model, max 2000 steps/episode. + +| model | floor | success eps | full 2k eps | avg laps/ep | total laps | mean lap (all) | best lap | avg steps | notes | +|---|---:|---:|---:|---:|---:|---:|---:|---:|---| +| exp14_base | 0.2 | 7/9 | 3/9 | 1.78 | 16 | 29.24 | 27.02 | 1332 | original champion | +| ft_006k | 0.4 | 1/9 | 0/9 | 0.11 | 1 | 21.36 | 21.36 | 335 | very fast when it works, extremely fragile | +| ft_024k | 0.4 | 4/9 | 0/9 | 0.56 | 5 | 21.58 | 20.53 | 575 | fast but fragile | +| ft_030k | 0.4 | 1/9 | 0/9 | 0.22 | 2 | 21.53 | 20.72 | 317 | very fast but extremely fragile | +| ft_036k | 0.2 | 9/9 | 6/9 | 2.78 | 25 | 27.93 | 26.16 | 1841 | best balance: fastest robust candidate | +| ft_042k | 0.2 | 8/9 | 4/9 | 1.89 | 17 | 29.25 | 27.09 | 1404 | decent, but worse than 36k | +| ft_048k | 0.2 | 6/9 | 3/9 | 1.44 | 13 | 31.15 | 28.31 | 1127 | degraded | + +## Recommendation + +Best overall candidate: +- `models/exp14-mountain-v5-finetune/checkpoint_0036000.zip` + +Reason: +- 9/9 successful episodes +- 25 total laps across 9 episodes +- mean lap 27.93s +- best lap 26.16s +- clearly more robust than the original exp14 champion and later finetune checkpoints + +## Raw result file + +- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.jsonl` diff --git a/agent/tmp_eval_mountain_candidates.py b/agent/tmp_eval_mountain_candidates.py new file mode 100644 index 0000000..eec2f33 --- /dev/null +++ b/agent/tmp_eval_mountain_candidates.py @@ -0,0 +1,149 @@ +import os, json, time +from datetime import datetime + +from stable_baselines3 import PPO +from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage +import gymnasium as gym +import numpy as np + +from donkeycar_sb3_runner import ThrottleClampWrapper + +HOST='10.0.0.55' +PORT=9091 +TRACK_ID='donkey-mountain-track-v0' +MAX_STEPS=2000 +EPISODES=9 +OUT='outerloop-results/mountain_candidate_eval_2026-04-19.jsonl' + +CANDIDATES = [ + ('exp14_base', 'models/exp14-mountain-v5/best_model.zip', 0.2), + ('ft_006k', 'models/exp14-mountain-v5-finetune/checkpoint_0006000.zip', 0.4), + ('ft_024k', 'models/exp14-mountain-v5-finetune/checkpoint_0024000.zip', 0.4), + ('ft_030k', 'models/exp14-mountain-v5-finetune/checkpoint_0030000.zip', 0.4), + ('ft_036k', 'models/exp14-mountain-v5-finetune/checkpoint_0036000.zip', 0.2), + ('ft_042k', 'models/exp14-mountain-v5-finetune/checkpoint_0042000.zip', 0.2), + ('ft_048k', 'models/exp14-mountain-v5-finetune/checkpoint_0048000.zip', 0.2), +] + +class V5RewardWrapper(gym.Wrapper): + def __init__(self, env, max_cte=8.0, min_lap_time=5.0): + super().__init__(env) + self.max_cte = max_cte + self.min_lap_time = min_lap_time + self._last_lc = 0 + def reset(self, **kwargs): + self._last_lc = 0 + return self.env.reset(**kwargs) + def step(self, action): + result = self.env.step(action) + if len(result) == 5: + obs, _sim, terminated, truncated, info = result + done = terminated or truncated + else: + obs, _sim, done, info = result + terminated, truncated = done, False + try: + cte = float(info.get('cte', 0.0) or 0.0) + except Exception: + cte = 0.0 + cte_quality = 1.0 - min(abs(cte) / self.max_cte, 1.0) + try: + speed = max(0.0, float(info.get('speed', 0.0) or 0.0)) + except Exception: + speed = 0.0 + speed_norm = min(speed / 10.0, 1.0) + reward = cte_quality * speed_norm + try: + current_lc = int(info.get('lap_count', 0) or 0) + except Exception: + current_lc = self._last_lc + force_terminate = False + if current_lc > self._last_lc: + self._last_lc = current_lc + try: + lap_time = float(info.get('last_lap_time', 999.0) or 999.0) + except Exception: + lap_time = 999.0 + if lap_time < self.min_lap_time: + reward = -10.0 * (self.min_lap_time / max(lap_time, 0.1)) + force_terminate = True + if len(result) == 5: + return obs, reward, terminated or force_terminate, truncated, info + return obs, reward, terminated or force_terminate, info + + +def make_env(base_throttle=0.2, throttle_floor=None): + def _init(): + raw = gym.make(TRACK_ID, conf={'host': HOST, 'port': PORT}) + env = ThrottleClampWrapper(raw, throttle_min=base_throttle) + if throttle_floor is not None: + class ThrottleFloorWrapper(gym.Wrapper): + def __init__(self, env, floor): + super().__init__(env) + self.floor = floor + def step(self, action): + act = np.array(action) + try: + act[1] = max(act[1], self.floor) + except Exception: + pass + return self.env.step(act) + def reset(self, **kwargs): + return self.env.reset(**kwargs) + env = ThrottleFloorWrapper(env, throttle_floor) + env = V5RewardWrapper(env) + return env + return _init + +os.makedirs(os.path.dirname(OUT), exist_ok=True) + +all_rows = [] +for label, model_path, floor in CANDIDATES: + print(f'\n=== Evaluating {label} floor={floor} path={model_path}', flush=True) + env = VecTransposeImage(DummyVecEnv([make_env(0.2, floor)])) + model = PPO.load(model_path, device='cpu') + model.set_env(env) + episodes = [] + for ep in range(EPISODES): + obs = env.reset() + steps = 0 + laps = 0 + prev_lc = 0 + lap_times = [] + total_reward = 0.0 + while steps < MAX_STEPS: + action, _ = model.predict(obs, deterministic=True) + obs, r, d, info = env.step(action) + inf = info[0] if isinstance(info, (list, tuple)) else info + total_reward += float(r[0]) + steps += 1 + lc = int(inf.get('lap_count', 0) or 0) + if lc > prev_lc: + try: + lap_times.append(float(inf.get('last_lap_time', 0) or 0)) + except Exception: + lap_times.append(0.0) + prev_lc = lc + laps = lc + if bool(d[0]): + break + row = { + 'label': label, + 'model_path': model_path, + 'throttle_floor': floor, + 'episode': ep + 1, + 'steps': steps, + 'laps': laps, + 'lap_times': lap_times, + 'reward': total_reward, + } + episodes.append(row) + all_rows.append(row) + print(f" ep{ep+1}: steps={steps} laps={laps} lap_times={lap_times}", flush=True) + env.close() + time.sleep(2) + +with open(OUT, 'w') as f: + for row in all_rows: + f.write(json.dumps(row) + '\n') +print('\nSaved to', OUT) diff --git a/docs/SESSION_LOG_2026-04-19.md b/docs/SESSION_LOG_2026-04-19.md index 0913dd0..fc34c6f 100644 --- a/docs/SESSION_LOG_2026-04-19.md +++ b/docs/SESSION_LOG_2026-04-19.md @@ -120,6 +120,59 @@ parallel envs are working. - Exp 11c (v6 reward, 250k): aborted — grass exploit found on generated_track - Exp 11d: pending fixes before re-run +## Mountain Track Finetune + Physics Investigation (2026-04-19 late session) + +### Finetune outcome summary +- Created and ran `agent/experiments/exp14_finetune_v5.py` +- Warm-start source: `agent/models/exp14-mountain-v5/best_model.zip` +- Schedule used: + - phase 1: runtime throttle floor `0.4` + - phase 2: runtime throttle floor `0.2` +- Training later degraded badly; later checkpoints became poor / unstable +- Best usable finetune checkpoint was **not** the final model + +### Robust checkpoint comparison on mountain_track +We ran a deterministic mountain-only comparison over 9 episodes per candidate. +Results saved to: +- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.jsonl` +- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.md` + +Winner: +- `agent/models/exp14-mountain-v5-finetune/checkpoint_0036000.zip` +- promoted copy: + - `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip` + +Key result: +- **ft_036k** achieved: + - 9/9 successful episodes + - 25 total laps across 9 episodes + - mean lap **27.93s** + - best lap **26.16s** +- This beat: + - original mountain champion for robustness + - earlier `0.4`-floor checkpoints for robustness + - later finetune checkpoints, which had degraded badly + +### Mountain physics discovery in Unity sim +Unity source path confirmed: +- `/mnt/c/Users/Paul/Documents/projects/sdsandbox` + +We found a likely real root cause for hill wheelspin: +- `sdsim/Assets/Scripts/WheelPhys.cs` scales wheel friction by the hit collider's physics material: + - `hit.collider.material.staticFriction * originalForwardStiffness` +- `mountain_track.unity` contains **4 explicit `Slippery` physics-material assignments** on the imported `long_road` FBX instance +- `Slippery.staticFriction = 0.1` +- `Road.staticFriction = 0.5` +- `Grippy.staticFriction = 0.66` + +Interpretation: +- mountain road traction is likely much lower than normal road tracks +- this matches observed wheelspin / poor uphill progress / getting stuck on hills + +We created a dedicated Unity investigation branch before changing anything: +- repo: `/mnt/c/Users/Paul/Documents/projects/sdsandbox` +- branch: `investigate-mountain-friction` + ## Critical Known Facts (DO NOT LOSE) ### throttle_min history (from Exp 1-9) diff --git a/docs/TEST_HISTORY.md b/docs/TEST_HISTORY.md index fcb6658..724fc3c 100644 --- a/docs/TEST_HISTORY.md +++ b/docs/TEST_HISTORY.md @@ -1,6 +1,6 @@ # Test History — DonkeyCar RL Autoresearch -Last updated: 2026-04-18 +Last updated: 2026-04-19 This document records every significant training experiment, what was changed, what was observed, and what was learned. Use this to make @@ -400,3 +400,48 @@ camera angle) to achieve generalization instead? - Try v5 reward with parallel envs but longer training (accept some circling) - Check if efficiency gate triggers too aggressively during normal cornering +--- + +## Exp 14b — Mountain finetune from exp14 champion (2026-04-19) + +- **Script:** `agent/experiments/exp14_finetune_v5.py` +- **Warm start:** `agent/models/exp14-mountain-v5/best_model.zip` +- **Schedule:** + - phase 1: runtime throttle floor `0.4` + - phase 2: runtime throttle floor `0.2` +- **Goal:** improve hill climbing, robustness, and lap time on `mountain_track` + +### Important outcome +The finetune run **did not improve monotonically**. It briefly improved, then later degraded badly. +This means the final/latest checkpoint is **not** the model we want to keep. + +### Candidate checkpoint comparison +We ran a fresh deterministic comparison on mountain only: +- **9 episodes per model** +- **2000 step cap** +- Results saved to: + - `agent/outerloop-results/mountain_candidate_eval_2026-04-19.jsonl` + - `agent/outerloop-results/mountain_candidate_eval_2026-04-19.md` + +| Model | Floor | Success eps | Full 2k eps | Avg laps/ep | Total laps | Mean lap | Best lap | Avg steps | Verdict | +|---|---:|---:|---:|---:|---:|---:|---:|---:|---| +| exp14_base | 0.2 | 7/9 | 3/9 | 1.78 | 16 | 29.24s | 27.02s | 1332 | Original champion | +| ft_006k | 0.4 | 1/9 | 0/9 | 0.11 | 1 | 21.36s | 21.36s | 335 | Very fast but unusably fragile | +| ft_024k | 0.4 | 4/9 | 0/9 | 0.56 | 5 | 21.58s | 20.53s | 575 | Fast but fragile | +| ft_030k | 0.4 | 1/9 | 0/9 | 0.22 | 2 | 21.53s | 20.72s | 317 | Very fast but unusably fragile | +| **ft_036k** | **0.2** | **9/9** | **6/9** | **2.78** | **25** | **27.93s** | **26.16s** | **1841** | **Best overall balance** | +| ft_042k | 0.2 | 8/9 | 4/9 | 1.89 | 17 | 29.25s | 27.09s | 1404 | Decent, but worse than 36k | +| ft_048k | 0.2 | 6/9 | 3/9 | 1.44 | 13 | 31.15s | 28.31s | 1127 | Degraded | + +### Best model captured +Best overall checkpoint from the finetune: +- `agent/models/exp14-mountain-v5-finetune/checkpoint_0036000.zip` + +Promoted copy saved as: +- `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip` + +### Key learning +- Early `0.4`-floor checkpoints can produce very fast laps, but are too fragile to trust. +- The best mountain finetune model is the **36k checkpoint after switching back to 0.2 floor**, not the later checkpoints. +- Later finetune checkpoints collapsed badly, matching the user's visual observation of wheelspin / poor driving. +