94 lines
4.8 KiB
Markdown
94 lines
4.8 KiB
Markdown
# Project State — April 27, 2026
|
||
|
||
## The Goal
|
||
|
||
Train a DonkeyCar model that generalises to any road-surface track
|
||
(outdoor, asphalt, lane markings) — demonstrated by driving a
|
||
never-seen track without crashing.
|
||
|
||
---
|
||
|
||
## Current Champion Models
|
||
|
||
### ✅ exp13-gentrack-v4 — generated_track specialist
|
||
- **Path:** `models/exp13-gentrack-v4/best_model.zip`
|
||
- **Trained on:** generated_track only, ~30k steps (stopped early), lr=0.000725, throttle_min=0.2
|
||
- **Reward:** v4 (base × efficiency × speed_bonus)
|
||
- **Performance:** Drives generated_track reliably, clean laps
|
||
- **Zero-shot:** Fails on mountain_track (expected — single-track specialist)
|
||
|
||
### ✅ exp14-mountain-v5-finetune ft_036k — mountain specialist
|
||
- **Path:** `models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
|
||
- **Trained on:** mountain_track, fine-tuned from exp14 base, checkpoint at 36k steps
|
||
- **Reward:** v5 (speed × CTE-quality), throttle_floor=0.2 (switched from 0.4 at 30k)
|
||
- **Performance:** 9/9 successful episodes, 25 total laps, mean lap 27.93s, best lap 26.16s
|
||
- **Zero-shot:** Fails on generated_track (expected — single-track specialist)
|
||
|
||
### ⭐ Wave 4 Trial 9 — best generalising model (but not reproducible)
|
||
- **Path:** `models/wave4-trial-0009/model.zip`
|
||
- **Trained on:** generated_track + mountain_track, ~90k steps, lr=0.000725, switch=6,851
|
||
- **Performance:** generated_track 2000/2000, mini_monaco 2000/2000 (zero-shot)
|
||
- **Problem:** Same hyperparameters repeated multiple times → all failed. This was a lucky random seed.
|
||
|
||
---
|
||
|
||
## What We Know (cumulative)
|
||
|
||
### Reward functions
|
||
- **v4** (base × efficiency × speed_bonus): works for generated_track; gives zero gradient on mountain hills
|
||
- **v5** (speed × CTE-quality): works for mountain; circular driving exploit possible on flat track
|
||
- **v6** (v5 + efficiency gate ≥ 0.15): prevents circular exploit; may suppress early exploration
|
||
|
||
### Training approaches tried and their outcomes
|
||
| Approach | Result |
|
||
|---|---|
|
||
| Single-track PPO (Exp 9, 13) | ✅ Reliable. Best per-track performance. |
|
||
| Round-robin close-and-switch (Wave 4, Exp 10) | ❌ 80% failure rate. Disrupts PPO rollout buffer. |
|
||
| Parallel DummyVecEnv 90k steps (Exp 11b) | ⚠️ Infrastructure works; 90k too few steps (194 steps on all tracks). |
|
||
| Cross-track warm start both directions (Exp 15, 16) | ❌ Both failed. Single-track policies too specialised for naive transfer. |
|
||
|
||
### Mountain track physics (fixed 2026-04-27)
|
||
The mountain_track.unity scene assigned Slippery physics material (staticFriction=0.1)
|
||
to 4 track surface colliders. WheelPhys.cs scales wheel grip by surface staticFriction,
|
||
so the car had 1/5 normal grip on the hill. This caused visible wheelspin.
|
||
Fixed by assigning Road material (staticFriction=0.5) to those 4 colliders in
|
||
`sdsim/Assets/Scenes/mountain_track.unity`. The project uses a pre-built Windows
|
||
executable (DonkeySimWin/donkey_sim.exe), so this fix is deferred until the sim
|
||
is rebuilt from source in Unity Editor. Proceed with Exp 17 using the existing binary.
|
||
|
||
### Key parameter knowledge
|
||
- **lr:** 0.000725 (from Trial 9 and Exp 9 — consistent with good results)
|
||
- **throttle_min:** 0.2 (v5/v6 reward gives non-zero gradient on hills even at 0.2)
|
||
- **n_steer/n_throttle:** Relevant for discrete action space only (PPO uses continuous)
|
||
- **Per-env throttle_min in DummyVecEnv:** Feasible — each env wrapped independently
|
||
|
||
---
|
||
|
||
## Open Strategy (as of April 27)
|
||
|
||
The goal is reliable multi-track generalisation. The validated path forward:
|
||
|
||
1. **Exp 17:** Parallel DummyVecEnv with 400k–500k steps
|
||
- Two sim instances: generated_track:9091, mountain_track:9093
|
||
- v6 reward on both (efficiency gate + CTE patience terminator)
|
||
- throttle_min=0.2 both envs (or optionally 0.5 on mountain, 0.2 on generated)
|
||
- lr=0.000725, checkpoint every 20k, best_model tracked throughout
|
||
- Eval mini_monaco zero-shot at every checkpoint
|
||
3. **If Exp 17 plateaus:** Try curriculum (generated_track only for 150k, then add mountain)
|
||
4. **If still stuck:** Tune v6 efficiency gate threshold (check % steps gated in early training)
|
||
|
||
See `docs/TEST_HISTORY.md` for full Exp 17 design.
|
||
|
||
---
|
||
|
||
## Models Available
|
||
|
||
| Model | Path | Status |
|
||
|---|---|---|
|
||
| exp13-gentrack-v4 | models/exp13-gentrack-v4/best_model.zip | ✅ Generated_track specialist |
|
||
| exp14-mountain-v5-finetune ft_036k | models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip | ✅ Mountain specialist (best overall mountain model) |
|
||
| exp14-mountain-v5 | models/exp14-mountain-v5/best_model.zip | ✅ Mountain base (good, slightly worse than ft_036k) |
|
||
| Wave 4 Trial 9 | models/wave4-trial-0009/model.zip | ✅ Best generalising model; unreproducible |
|
||
| Phase 2 champion | models/champion/model.zip | ✅ generated_road specialist only |
|
||
| Wave 4 other trials | models/wave4-trial-XXXX/ | Mostly crash on all tracks |
|