docs: capture robust mountain finetune winner at 36k and preserve eval comparison
This commit is contained in:
parent
2b90de2fba
commit
0da04327ef
67
DECISIONS.md
67
DECISIONS.md
|
|
@ -480,3 +480,70 @@ The car exits through the gap, CTE quickly exceeds 16m, hits `pass` — episode
|
|||
- Resets counter when car returns to within 4m (brief excursions allowed)
|
||||
|
||||
**Note:** We cannot fix the Unity sim code directly.
|
||||
|
||||
---
|
||||
|
||||
## ADR-022: Promote Mountain Finetune Checkpoint by Robustness, Not Raw Lap Time
|
||||
|
||||
**Date:** 2026-04-19
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:** The mountain finetune (`exp14_finetune_v5.py`) produced several early
|
||||
checkpoints with very fast lap times under a temporary `throttle_floor=0.4`, but
|
||||
those checkpoints were extremely fragile in repeated deterministic evaluation.
|
||||
Later checkpoints degraded badly and often failed to complete any laps.
|
||||
|
||||
**Decision:** Select the best mountain finetune checkpoint using a combined
|
||||
speed + robustness criterion, not raw best lap alone.
|
||||
|
||||
**Chosen model:**
|
||||
- canonical checkpoint: `agent/models/exp14-mountain-v5-finetune/checkpoint_0036000.zip`
|
||||
- promoted copy: `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
|
||||
|
||||
**Why:** In a fresh 9-episode deterministic mountain evaluation, the 36k
|
||||
checkpoint achieved:
|
||||
- 9/9 successful episodes
|
||||
- 25 total laps
|
||||
- mean lap 27.93s
|
||||
- best lap 26.16s
|
||||
|
||||
This beat:
|
||||
- the original exp14 mountain champion in robustness
|
||||
- the 6k/24k/30k finetune checkpoints in robustness
|
||||
- the later 42k/48k+ checkpoints in both robustness and stability
|
||||
|
||||
**Consequence:** Future mountain work should warm-start from the promoted 36k
|
||||
robust checkpoint, not from the final finetune model and not from the fastest
|
||||
but fragile 0.4-floor checkpoints.
|
||||
|
||||
---
|
||||
|
||||
## ADR-023: Mountain Track Likely Has a Real Unity Physics / Traction Problem
|
||||
|
||||
**Date:** 2026-04-19
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:** The user observed that on mountain hills, the DonkeyCar often tries
|
||||
to move forward while the wheels visibly spin and the car barely advances.
|
||||
This looked like traction loss rather than a pure policy error.
|
||||
|
||||
**Evidence from Unity source:**
|
||||
- Unity repo path: `/mnt/c/Users/Paul/Documents/projects/sdsandbox`
|
||||
- `sdsim/Assets/Scripts/WheelPhys.cs` scales wheel friction by the hit collider's
|
||||
physics material static friction:
|
||||
- `hit.collider.material.staticFriction * originalForwardStiffness`
|
||||
- `sdsim/Assets/Scenes/mountain_track.unity` contains 4 explicit `Slippery`
|
||||
material assignments on colliders from the imported `long_road` FBX instance
|
||||
- physics material values:
|
||||
- `Slippery.staticFriction = 0.1`
|
||||
- `Road.staticFriction = 0.5`
|
||||
- `Grippy.staticFriction = 0.66`
|
||||
|
||||
**Decision:** Treat mountain wheelspin / poor uphill progress as a likely real
|
||||
sim-physics issue, not just an RL reward or hyperparameter issue.
|
||||
|
||||
**Consequence:** Before trusting further mountain finetuning results, investigate
|
||||
and likely patch the Unity scene on branch:
|
||||
- `investigate-mountain-friction`
|
||||
|
||||
This should be prioritized over adding more reward heuristics.
|
||||
|
|
|
|||
Binary file not shown.
|
|
@ -0,0 +1,63 @@
|
|||
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 1, "steps": 814, "laps": 1, "lap_times": [31.09375], "reward": 132.39361070096493}
|
||||
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 2, "steps": 2000, "laps": 3, "lap_times": [29.046875, 28.390625, 28.71875], "reward": 345.59998378881755}
|
||||
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 3, "steps": 1859, "laps": 3, "lap_times": [30.109375, 27.875, 27.015625], "reward": 323.1350573108066}
|
||||
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 4, "steps": 2000, "laps": 3, "lap_times": [30.484375, 27.953125, 28.890625], "reward": 338.7155503882095}
|
||||
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 5, "steps": 190, "laps": 0, "lap_times": [], "reward": 34.7614578341836}
|
||||
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 6, "steps": 689, "laps": 0, "lap_times": [], "reward": 89.68411724013276}
|
||||
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 7, "steps": 1164, "laps": 1, "lap_times": [29.34375], "reward": 193.3484125709192}
|
||||
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 8, "steps": 2000, "laps": 3, "lap_times": [29.125, 36.109375, 28.0], "reward": 325.9044730809983}
|
||||
{"label": "exp14_base", "model_path": "models/exp14-mountain-v5/best_model.zip", "throttle_floor": 0.2, "episode": 9, "steps": 1275, "laps": 2, "lap_times": [28.484375, 27.140625], "reward": 230.20822774680255}
|
||||
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 1, "steps": 177, "laps": 0, "lap_times": [], "reward": 36.94930865149945}
|
||||
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 2, "steps": 126, "laps": 0, "lap_times": [], "reward": 27.089253417694636}
|
||||
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 3, "steps": 161, "laps": 0, "lap_times": [], "reward": 35.617936324328184}
|
||||
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 4, "steps": 372, "laps": 0, "lap_times": [], "reward": 81.32034226437099}
|
||||
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 5, "steps": 348, "laps": 0, "lap_times": [], "reward": 80.90182103098687}
|
||||
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 6, "steps": 351, "laps": 0, "lap_times": [], "reward": 83.38630702279897}
|
||||
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 7, "steps": 769, "laps": 1, "lap_times": [21.359375], "reward": 181.13429199228995}
|
||||
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 8, "steps": 356, "laps": 0, "lap_times": [], "reward": 82.55838803868119}
|
||||
{"label": "ft_006k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0006000.zip", "throttle_floor": 0.4, "episode": 9, "steps": 357, "laps": 0, "lap_times": [], "reward": 83.45234980319492}
|
||||
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 1, "steps": 395, "laps": 0, "lap_times": [], "reward": 92.56352121913824}
|
||||
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 2, "steps": 813, "laps": 1, "lap_times": [22.5625], "reward": 174.79000809043646}
|
||||
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 3, "steps": 776, "laps": 1, "lap_times": [21.9375], "reward": 168.13511967613886}
|
||||
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 4, "steps": 1248, "laps": 2, "lap_times": [20.53125, 21.21875], "reward": 268.3056740048578}
|
||||
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 5, "steps": 382, "laps": 0, "lap_times": [], "reward": 80.95213605418394}
|
||||
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 6, "steps": 555, "laps": 1, "lap_times": [21.640625], "reward": 127.2610695830399}
|
||||
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 7, "steps": 149, "laps": 0, "lap_times": [], "reward": 29.529473824529305}
|
||||
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 8, "steps": 438, "laps": 0, "lap_times": [], "reward": 91.48110114145402}
|
||||
{"label": "ft_024k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0024000.zip", "throttle_floor": 0.4, "episode": 9, "steps": 416, "laps": 0, "lap_times": [], "reward": 91.06411632476375}
|
||||
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 1, "steps": 72, "laps": 0, "lap_times": [], "reward": 10.501809980045437}
|
||||
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 2, "steps": 451, "laps": 0, "lap_times": [], "reward": 87.36528500499116}
|
||||
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 3, "steps": 1012, "laps": 2, "lap_times": [22.34375, 20.71875], "reward": 224.14087196643231}
|
||||
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 4, "steps": 157, "laps": 0, "lap_times": [], "reward": 36.810340652579725}
|
||||
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 5, "steps": 167, "laps": 0, "lap_times": [], "reward": 38.044355134905345}
|
||||
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 6, "steps": 167, "laps": 0, "lap_times": [], "reward": 34.502623422992656}
|
||||
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 7, "steps": 314, "laps": 0, "lap_times": [], "reward": 64.56299563837547}
|
||||
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 8, "steps": 183, "laps": 0, "lap_times": [], "reward": 39.53914854944924}
|
||||
{"label": "ft_030k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0030000.zip", "throttle_floor": 0.4, "episode": 9, "steps": 329, "laps": 0, "lap_times": [], "reward": 65.17448510392569}
|
||||
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 1, "steps": 811, "laps": 1, "lap_times": [29.921875], "reward": 134.79952276336735}
|
||||
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 2, "steps": 2000, "laps": 3, "lap_times": [27.890625, 26.390625, 26.78125], "reward": 355.31346766664956}
|
||||
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 3, "steps": 2000, "laps": 3, "lap_times": [28.765625, 27.765625, 27.46875], "reward": 344.61911652631443}
|
||||
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 4, "steps": 1833, "laps": 3, "lap_times": [27.5625, 27.375, 26.15625], "reward": 330.4355698036379}
|
||||
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 5, "steps": 1928, "laps": 3, "lap_times": [29.125, 28.109375, 27.6875], "reward": 332.7582264354696}
|
||||
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 6, "steps": 2000, "laps": 3, "lap_times": [29.46875, 26.8125, 28.5], "reward": 342.77918509633764}
|
||||
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 7, "steps": 2000, "laps": 3, "lap_times": [28.515625, 26.4375, 28.046875], "reward": 356.1632144622272}
|
||||
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 8, "steps": 2000, "laps": 3, "lap_times": [29.4375, 28.140625, 26.546875], "reward": 351.6208579884842}
|
||||
{"label": "ft_036k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0036000.zip", "throttle_floor": 0.2, "episode": 9, "steps": 2000, "laps": 3, "lap_times": [29.484375, 27.078125, 28.71875], "reward": 346.25121597342695}
|
||||
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 1, "steps": 727, "laps": 1, "lap_times": [27.65625], "reward": 127.74538866006424}
|
||||
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 2, "steps": 1082, "laps": 1, "lap_times": [29.53125], "reward": 180.6854192542378}
|
||||
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 3, "steps": 2000, "laps": 3, "lap_times": [28.796875, 27.171875, 31.359375], "reward": 316.97187187187683}
|
||||
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 4, "steps": 436, "laps": 0, "lap_times": [], "reward": 76.15429569082335}
|
||||
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 5, "steps": 1255, "laps": 2, "lap_times": [29.375, 27.84375], "reward": 214.58350544204313}
|
||||
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 6, "steps": 2000, "laps": 3, "lap_times": [29.265625, 27.09375, 28.046875], "reward": 346.9328897984469}
|
||||
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 7, "steps": 2000, "laps": 3, "lap_times": [29.609375, 28.25, 29.0625], "reward": 325.0631527120613}
|
||||
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 8, "steps": 1139, "laps": 1, "lap_times": [36.640625], "reward": 174.01025118848793}
|
||||
{"label": "ft_042k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0042000.zip", "throttle_floor": 0.2, "episode": 9, "steps": 2000, "laps": 3, "lap_times": [28.421875, 31.546875, 27.59375], "reward": 317.8546572496798}
|
||||
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 1, "steps": 2000, "laps": 2, "lap_times": [41.96875, 29.703125], "reward": 295.66797001392115}
|
||||
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 2, "steps": 209, "laps": 0, "lap_times": [], "reward": 39.03441974380985}
|
||||
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 3, "steps": 1353, "laps": 2, "lap_times": [31.84375, 28.46875], "reward": 227.3901946817232}
|
||||
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 4, "steps": 2000, "laps": 3, "lap_times": [30.5, 31.171875, 29.15625], "reward": 332.7062042630514}
|
||||
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 5, "steps": 201, "laps": 0, "lap_times": [], "reward": 38.25372101787252}
|
||||
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 6, "steps": 2000, "laps": 3, "lap_times": [29.875, 28.90625, 28.3125], "reward": 332.71756463419115}
|
||||
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 7, "steps": 1365, "laps": 2, "lap_times": [30.859375, 29.921875], "reward": 235.64415748882038}
|
||||
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 8, "steps": 184, "laps": 0, "lap_times": [], "reward": 32.21486316375649}
|
||||
{"label": "ft_048k", "model_path": "models/exp14-mountain-v5-finetune/checkpoint_0048000.zip", "throttle_floor": 0.2, "episode": 9, "steps": 830, "laps": 1, "lap_times": [34.25], "reward": 131.9192705488067}
|
||||
|
|
@ -0,0 +1,29 @@
|
|||
# Mountain candidate checkpoint evaluation — 2026-04-19
|
||||
|
||||
Deterministic eval on `mountain_track`, 9 episodes per model, max 2000 steps/episode.
|
||||
|
||||
| model | floor | success eps | full 2k eps | avg laps/ep | total laps | mean lap (all) | best lap | avg steps | notes |
|
||||
|---|---:|---:|---:|---:|---:|---:|---:|---:|---|
|
||||
| exp14_base | 0.2 | 7/9 | 3/9 | 1.78 | 16 | 29.24 | 27.02 | 1332 | original champion |
|
||||
| ft_006k | 0.4 | 1/9 | 0/9 | 0.11 | 1 | 21.36 | 21.36 | 335 | very fast when it works, extremely fragile |
|
||||
| ft_024k | 0.4 | 4/9 | 0/9 | 0.56 | 5 | 21.58 | 20.53 | 575 | fast but fragile |
|
||||
| ft_030k | 0.4 | 1/9 | 0/9 | 0.22 | 2 | 21.53 | 20.72 | 317 | very fast but extremely fragile |
|
||||
| ft_036k | 0.2 | 9/9 | 6/9 | 2.78 | 25 | 27.93 | 26.16 | 1841 | best balance: fastest robust candidate |
|
||||
| ft_042k | 0.2 | 8/9 | 4/9 | 1.89 | 17 | 29.25 | 27.09 | 1404 | decent, but worse than 36k |
|
||||
| ft_048k | 0.2 | 6/9 | 3/9 | 1.44 | 13 | 31.15 | 28.31 | 1127 | degraded |
|
||||
|
||||
## Recommendation
|
||||
|
||||
Best overall candidate:
|
||||
- `models/exp14-mountain-v5-finetune/checkpoint_0036000.zip`
|
||||
|
||||
Reason:
|
||||
- 9/9 successful episodes
|
||||
- 25 total laps across 9 episodes
|
||||
- mean lap 27.93s
|
||||
- best lap 26.16s
|
||||
- clearly more robust than the original exp14 champion and later finetune checkpoints
|
||||
|
||||
## Raw result file
|
||||
|
||||
- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.jsonl`
|
||||
|
|
@ -0,0 +1,149 @@
|
|||
import os, json, time
|
||||
from datetime import datetime
|
||||
|
||||
from stable_baselines3 import PPO
|
||||
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
|
||||
from donkeycar_sb3_runner import ThrottleClampWrapper
|
||||
|
||||
HOST='10.0.0.55'
|
||||
PORT=9091
|
||||
TRACK_ID='donkey-mountain-track-v0'
|
||||
MAX_STEPS=2000
|
||||
EPISODES=9
|
||||
OUT='outerloop-results/mountain_candidate_eval_2026-04-19.jsonl'
|
||||
|
||||
CANDIDATES = [
|
||||
('exp14_base', 'models/exp14-mountain-v5/best_model.zip', 0.2),
|
||||
('ft_006k', 'models/exp14-mountain-v5-finetune/checkpoint_0006000.zip', 0.4),
|
||||
('ft_024k', 'models/exp14-mountain-v5-finetune/checkpoint_0024000.zip', 0.4),
|
||||
('ft_030k', 'models/exp14-mountain-v5-finetune/checkpoint_0030000.zip', 0.4),
|
||||
('ft_036k', 'models/exp14-mountain-v5-finetune/checkpoint_0036000.zip', 0.2),
|
||||
('ft_042k', 'models/exp14-mountain-v5-finetune/checkpoint_0042000.zip', 0.2),
|
||||
('ft_048k', 'models/exp14-mountain-v5-finetune/checkpoint_0048000.zip', 0.2),
|
||||
]
|
||||
|
||||
class V5RewardWrapper(gym.Wrapper):
|
||||
def __init__(self, env, max_cte=8.0, min_lap_time=5.0):
|
||||
super().__init__(env)
|
||||
self.max_cte = max_cte
|
||||
self.min_lap_time = min_lap_time
|
||||
self._last_lc = 0
|
||||
def reset(self, **kwargs):
|
||||
self._last_lc = 0
|
||||
return self.env.reset(**kwargs)
|
||||
def step(self, action):
|
||||
result = self.env.step(action)
|
||||
if len(result) == 5:
|
||||
obs, _sim, terminated, truncated, info = result
|
||||
done = terminated or truncated
|
||||
else:
|
||||
obs, _sim, done, info = result
|
||||
terminated, truncated = done, False
|
||||
try:
|
||||
cte = float(info.get('cte', 0.0) or 0.0)
|
||||
except Exception:
|
||||
cte = 0.0
|
||||
cte_quality = 1.0 - min(abs(cte) / self.max_cte, 1.0)
|
||||
try:
|
||||
speed = max(0.0, float(info.get('speed', 0.0) or 0.0))
|
||||
except Exception:
|
||||
speed = 0.0
|
||||
speed_norm = min(speed / 10.0, 1.0)
|
||||
reward = cte_quality * speed_norm
|
||||
try:
|
||||
current_lc = int(info.get('lap_count', 0) or 0)
|
||||
except Exception:
|
||||
current_lc = self._last_lc
|
||||
force_terminate = False
|
||||
if current_lc > self._last_lc:
|
||||
self._last_lc = current_lc
|
||||
try:
|
||||
lap_time = float(info.get('last_lap_time', 999.0) or 999.0)
|
||||
except Exception:
|
||||
lap_time = 999.0
|
||||
if lap_time < self.min_lap_time:
|
||||
reward = -10.0 * (self.min_lap_time / max(lap_time, 0.1))
|
||||
force_terminate = True
|
||||
if len(result) == 5:
|
||||
return obs, reward, terminated or force_terminate, truncated, info
|
||||
return obs, reward, terminated or force_terminate, info
|
||||
|
||||
|
||||
def make_env(base_throttle=0.2, throttle_floor=None):
|
||||
def _init():
|
||||
raw = gym.make(TRACK_ID, conf={'host': HOST, 'port': PORT})
|
||||
env = ThrottleClampWrapper(raw, throttle_min=base_throttle)
|
||||
if throttle_floor is not None:
|
||||
class ThrottleFloorWrapper(gym.Wrapper):
|
||||
def __init__(self, env, floor):
|
||||
super().__init__(env)
|
||||
self.floor = floor
|
||||
def step(self, action):
|
||||
act = np.array(action)
|
||||
try:
|
||||
act[1] = max(act[1], self.floor)
|
||||
except Exception:
|
||||
pass
|
||||
return self.env.step(act)
|
||||
def reset(self, **kwargs):
|
||||
return self.env.reset(**kwargs)
|
||||
env = ThrottleFloorWrapper(env, throttle_floor)
|
||||
env = V5RewardWrapper(env)
|
||||
return env
|
||||
return _init
|
||||
|
||||
os.makedirs(os.path.dirname(OUT), exist_ok=True)
|
||||
|
||||
all_rows = []
|
||||
for label, model_path, floor in CANDIDATES:
|
||||
print(f'\n=== Evaluating {label} floor={floor} path={model_path}', flush=True)
|
||||
env = VecTransposeImage(DummyVecEnv([make_env(0.2, floor)]))
|
||||
model = PPO.load(model_path, device='cpu')
|
||||
model.set_env(env)
|
||||
episodes = []
|
||||
for ep in range(EPISODES):
|
||||
obs = env.reset()
|
||||
steps = 0
|
||||
laps = 0
|
||||
prev_lc = 0
|
||||
lap_times = []
|
||||
total_reward = 0.0
|
||||
while steps < MAX_STEPS:
|
||||
action, _ = model.predict(obs, deterministic=True)
|
||||
obs, r, d, info = env.step(action)
|
||||
inf = info[0] if isinstance(info, (list, tuple)) else info
|
||||
total_reward += float(r[0])
|
||||
steps += 1
|
||||
lc = int(inf.get('lap_count', 0) or 0)
|
||||
if lc > prev_lc:
|
||||
try:
|
||||
lap_times.append(float(inf.get('last_lap_time', 0) or 0))
|
||||
except Exception:
|
||||
lap_times.append(0.0)
|
||||
prev_lc = lc
|
||||
laps = lc
|
||||
if bool(d[0]):
|
||||
break
|
||||
row = {
|
||||
'label': label,
|
||||
'model_path': model_path,
|
||||
'throttle_floor': floor,
|
||||
'episode': ep + 1,
|
||||
'steps': steps,
|
||||
'laps': laps,
|
||||
'lap_times': lap_times,
|
||||
'reward': total_reward,
|
||||
}
|
||||
episodes.append(row)
|
||||
all_rows.append(row)
|
||||
print(f" ep{ep+1}: steps={steps} laps={laps} lap_times={lap_times}", flush=True)
|
||||
env.close()
|
||||
time.sleep(2)
|
||||
|
||||
with open(OUT, 'w') as f:
|
||||
for row in all_rows:
|
||||
f.write(json.dumps(row) + '\n')
|
||||
print('\nSaved to', OUT)
|
||||
|
|
@ -120,6 +120,59 @@ parallel envs are working.
|
|||
- Exp 11c (v6 reward, 250k): aborted — grass exploit found on generated_track
|
||||
- Exp 11d: pending fixes before re-run
|
||||
|
||||
## Mountain Track Finetune + Physics Investigation (2026-04-19 late session)
|
||||
|
||||
### Finetune outcome summary
|
||||
- Created and ran `agent/experiments/exp14_finetune_v5.py`
|
||||
- Warm-start source: `agent/models/exp14-mountain-v5/best_model.zip`
|
||||
- Schedule used:
|
||||
- phase 1: runtime throttle floor `0.4`
|
||||
- phase 2: runtime throttle floor `0.2`
|
||||
- Training later degraded badly; later checkpoints became poor / unstable
|
||||
- Best usable finetune checkpoint was **not** the final model
|
||||
|
||||
### Robust checkpoint comparison on mountain_track
|
||||
We ran a deterministic mountain-only comparison over 9 episodes per candidate.
|
||||
Results saved to:
|
||||
- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.jsonl`
|
||||
- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.md`
|
||||
|
||||
Winner:
|
||||
- `agent/models/exp14-mountain-v5-finetune/checkpoint_0036000.zip`
|
||||
- promoted copy:
|
||||
- `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
|
||||
|
||||
Key result:
|
||||
- **ft_036k** achieved:
|
||||
- 9/9 successful episodes
|
||||
- 25 total laps across 9 episodes
|
||||
- mean lap **27.93s**
|
||||
- best lap **26.16s**
|
||||
- This beat:
|
||||
- original mountain champion for robustness
|
||||
- earlier `0.4`-floor checkpoints for robustness
|
||||
- later finetune checkpoints, which had degraded badly
|
||||
|
||||
### Mountain physics discovery in Unity sim
|
||||
Unity source path confirmed:
|
||||
- `/mnt/c/Users/Paul/Documents/projects/sdsandbox`
|
||||
|
||||
We found a likely real root cause for hill wheelspin:
|
||||
- `sdsim/Assets/Scripts/WheelPhys.cs` scales wheel friction by the hit collider's physics material:
|
||||
- `hit.collider.material.staticFriction * originalForwardStiffness`
|
||||
- `mountain_track.unity` contains **4 explicit `Slippery` physics-material assignments** on the imported `long_road` FBX instance
|
||||
- `Slippery.staticFriction = 0.1`
|
||||
- `Road.staticFriction = 0.5`
|
||||
- `Grippy.staticFriction = 0.66`
|
||||
|
||||
Interpretation:
|
||||
- mountain road traction is likely much lower than normal road tracks
|
||||
- this matches observed wheelspin / poor uphill progress / getting stuck on hills
|
||||
|
||||
We created a dedicated Unity investigation branch before changing anything:
|
||||
- repo: `/mnt/c/Users/Paul/Documents/projects/sdsandbox`
|
||||
- branch: `investigate-mountain-friction`
|
||||
|
||||
## Critical Known Facts (DO NOT LOSE)
|
||||
|
||||
### throttle_min history (from Exp 1-9)
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# Test History — DonkeyCar RL Autoresearch
|
||||
|
||||
Last updated: 2026-04-18
|
||||
Last updated: 2026-04-19
|
||||
|
||||
This document records every significant training experiment, what was
|
||||
changed, what was observed, and what was learned. Use this to make
|
||||
|
|
@ -400,3 +400,48 @@ camera angle) to achieve generalization instead?
|
|||
- Try v5 reward with parallel envs but longer training (accept some circling)
|
||||
- Check if efficiency gate triggers too aggressively during normal cornering
|
||||
|
||||
---
|
||||
|
||||
## Exp 14b — Mountain finetune from exp14 champion (2026-04-19)
|
||||
|
||||
- **Script:** `agent/experiments/exp14_finetune_v5.py`
|
||||
- **Warm start:** `agent/models/exp14-mountain-v5/best_model.zip`
|
||||
- **Schedule:**
|
||||
- phase 1: runtime throttle floor `0.4`
|
||||
- phase 2: runtime throttle floor `0.2`
|
||||
- **Goal:** improve hill climbing, robustness, and lap time on `mountain_track`
|
||||
|
||||
### Important outcome
|
||||
The finetune run **did not improve monotonically**. It briefly improved, then later degraded badly.
|
||||
This means the final/latest checkpoint is **not** the model we want to keep.
|
||||
|
||||
### Candidate checkpoint comparison
|
||||
We ran a fresh deterministic comparison on mountain only:
|
||||
- **9 episodes per model**
|
||||
- **2000 step cap**
|
||||
- Results saved to:
|
||||
- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.jsonl`
|
||||
- `agent/outerloop-results/mountain_candidate_eval_2026-04-19.md`
|
||||
|
||||
| Model | Floor | Success eps | Full 2k eps | Avg laps/ep | Total laps | Mean lap | Best lap | Avg steps | Verdict |
|
||||
|---|---:|---:|---:|---:|---:|---:|---:|---:|---|
|
||||
| exp14_base | 0.2 | 7/9 | 3/9 | 1.78 | 16 | 29.24s | 27.02s | 1332 | Original champion |
|
||||
| ft_006k | 0.4 | 1/9 | 0/9 | 0.11 | 1 | 21.36s | 21.36s | 335 | Very fast but unusably fragile |
|
||||
| ft_024k | 0.4 | 4/9 | 0/9 | 0.56 | 5 | 21.58s | 20.53s | 575 | Fast but fragile |
|
||||
| ft_030k | 0.4 | 1/9 | 0/9 | 0.22 | 2 | 21.53s | 20.72s | 317 | Very fast but unusably fragile |
|
||||
| **ft_036k** | **0.2** | **9/9** | **6/9** | **2.78** | **25** | **27.93s** | **26.16s** | **1841** | **Best overall balance** |
|
||||
| ft_042k | 0.2 | 8/9 | 4/9 | 1.89 | 17 | 29.25s | 27.09s | 1404 | Decent, but worse than 36k |
|
||||
| ft_048k | 0.2 | 6/9 | 3/9 | 1.44 | 13 | 31.15s | 28.31s | 1127 | Degraded |
|
||||
|
||||
### Best model captured
|
||||
Best overall checkpoint from the finetune:
|
||||
- `agent/models/exp14-mountain-v5-finetune/checkpoint_0036000.zip`
|
||||
|
||||
Promoted copy saved as:
|
||||
- `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
|
||||
|
||||
### Key learning
|
||||
- Early `0.4`-floor checkpoints can produce very fast laps, but are too fragile to trust.
|
||||
- The best mountain finetune model is the **36k checkpoint after switching back to 0.2 floor**, not the later checkpoints.
|
||||
- Later finetune checkpoints collapsed badly, matching the user's visual observation of wheelspin / poor driving.
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue