donkeycar-rl-autoresearch/docs/STATE.md

94 lines
4.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Project State — April 27, 2026
## The Goal
Train a DonkeyCar model that generalises to any road-surface track
(outdoor, asphalt, lane markings) — demonstrated by driving a
never-seen track without crashing.
---
## Current Champion Models
### ✅ exp13-gentrack-v4 — generated_track specialist
- **Path:** `models/exp13-gentrack-v4/best_model.zip`
- **Trained on:** generated_track only, ~30k steps (stopped early), lr=0.000725, throttle_min=0.2
- **Reward:** v4 (base × efficiency × speed_bonus)
- **Performance:** Drives generated_track reliably, clean laps
- **Zero-shot:** Fails on mountain_track (expected — single-track specialist)
### ✅ exp14-mountain-v5-finetune ft_036k — mountain specialist
- **Path:** `models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
- **Trained on:** mountain_track, fine-tuned from exp14 base, checkpoint at 36k steps
- **Reward:** v5 (speed × CTE-quality), throttle_floor=0.2 (switched from 0.4 at 30k)
- **Performance:** 9/9 successful episodes, 25 total laps, mean lap 27.93s, best lap 26.16s
- **Zero-shot:** Fails on generated_track (expected — single-track specialist)
### ⭐ Wave 4 Trial 9 — best generalising model (but not reproducible)
- **Path:** `models/wave4-trial-0009/model.zip`
- **Trained on:** generated_track + mountain_track, ~90k steps, lr=0.000725, switch=6,851
- **Performance:** generated_track 2000/2000, mini_monaco 2000/2000 (zero-shot)
- **Problem:** Same hyperparameters repeated multiple times → all failed. This was a lucky random seed.
---
## What We Know (cumulative)
### Reward functions
- **v4** (base × efficiency × speed_bonus): works for generated_track; gives zero gradient on mountain hills
- **v5** (speed × CTE-quality): works for mountain; circular driving exploit possible on flat track
- **v6** (v5 + efficiency gate ≥ 0.15): prevents circular exploit; may suppress early exploration
### Training approaches tried and their outcomes
| Approach | Result |
|---|---|
| Single-track PPO (Exp 9, 13) | ✅ Reliable. Best per-track performance. |
| Round-robin close-and-switch (Wave 4, Exp 10) | ❌ 80% failure rate. Disrupts PPO rollout buffer. |
| Parallel DummyVecEnv 90k steps (Exp 11b) | ⚠️ Infrastructure works; 90k too few steps (194 steps on all tracks). |
| Cross-track warm start both directions (Exp 15, 16) | ❌ Both failed. Single-track policies too specialised for naive transfer. |
### Mountain track physics (fixed 2026-04-27)
The mountain_track.unity scene assigned Slippery physics material (staticFriction=0.1)
to 4 track surface colliders. WheelPhys.cs scales wheel grip by surface staticFriction,
so the car had 1/5 normal grip on the hill. This caused visible wheelspin.
Fixed by assigning Road material (staticFriction=0.5) to those 4 colliders in
`sdsim/Assets/Scenes/mountain_track.unity`. The project uses a pre-built Windows
executable (DonkeySimWin/donkey_sim.exe), so this fix is deferred until the sim
is rebuilt from source in Unity Editor. Proceed with Exp 17 using the existing binary.
### Key parameter knowledge
- **lr:** 0.000725 (from Trial 9 and Exp 9 — consistent with good results)
- **throttle_min:** 0.2 (v5/v6 reward gives non-zero gradient on hills even at 0.2)
- **n_steer/n_throttle:** Relevant for discrete action space only (PPO uses continuous)
- **Per-env throttle_min in DummyVecEnv:** Feasible — each env wrapped independently
---
## Open Strategy (as of April 27)
The goal is reliable multi-track generalisation. The validated path forward:
1. **Exp 17:** Parallel DummyVecEnv with 400k500k steps
- Two sim instances: generated_track:9091, mountain_track:9093
- v6 reward on both (efficiency gate + CTE patience terminator)
- throttle_min=0.2 both envs (or optionally 0.5 on mountain, 0.2 on generated)
- lr=0.000725, checkpoint every 20k, best_model tracked throughout
- Eval mini_monaco zero-shot at every checkpoint
3. **If Exp 17 plateaus:** Try curriculum (generated_track only for 150k, then add mountain)
4. **If still stuck:** Tune v6 efficiency gate threshold (check % steps gated in early training)
See `docs/TEST_HISTORY.md` for full Exp 17 design.
---
## Models Available
| Model | Path | Status |
|---|---|---|
| exp13-gentrack-v4 | models/exp13-gentrack-v4/best_model.zip | ✅ Generated_track specialist |
| exp14-mountain-v5-finetune ft_036k | models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip | ✅ Mountain specialist (best overall mountain model) |
| exp14-mountain-v5 | models/exp14-mountain-v5/best_model.zip | ✅ Mountain base (good, slightly worse than ft_036k) |
| Wave 4 Trial 9 | models/wave4-trial-0009/model.zip | ✅ Best generalising model; unreproducible |
| Phase 2 champion | models/champion/model.zip | ✅ generated_road specialist only |
| Wave 4 other trials | models/wave4-trial-XXXX/ | Mostly crash on all tracks |