donkeycar-rl-autoresearch/docs/STATE.md

4.8 KiB
Raw Blame History

Project State — April 27, 2026

The Goal

Train a DonkeyCar model that generalises to any road-surface track (outdoor, asphalt, lane markings) — demonstrated by driving a never-seen track without crashing.


Current Champion Models

exp13-gentrack-v4 — generated_track specialist

  • Path: models/exp13-gentrack-v4/best_model.zip
  • Trained on: generated_track only, ~30k steps (stopped early), lr=0.000725, throttle_min=0.2
  • Reward: v4 (base × efficiency × speed_bonus)
  • Performance: Drives generated_track reliably, clean laps
  • Zero-shot: Fails on mountain_track (expected — single-track specialist)

exp14-mountain-v5-finetune ft_036k — mountain specialist

  • Path: models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip
  • Trained on: mountain_track, fine-tuned from exp14 base, checkpoint at 36k steps
  • Reward: v5 (speed × CTE-quality), throttle_floor=0.2 (switched from 0.4 at 30k)
  • Performance: 9/9 successful episodes, 25 total laps, mean lap 27.93s, best lap 26.16s
  • Zero-shot: Fails on generated_track (expected — single-track specialist)

Wave 4 Trial 9 — best generalising model (but not reproducible)

  • Path: models/wave4-trial-0009/model.zip
  • Trained on: generated_track + mountain_track, ~90k steps, lr=0.000725, switch=6,851
  • Performance: generated_track 2000/2000, mini_monaco 2000/2000 (zero-shot)
  • Problem: Same hyperparameters repeated multiple times → all failed. This was a lucky random seed.

What We Know (cumulative)

Reward functions

  • v4 (base × efficiency × speed_bonus): works for generated_track; gives zero gradient on mountain hills
  • v5 (speed × CTE-quality): works for mountain; circular driving exploit possible on flat track
  • v6 (v5 + efficiency gate ≥ 0.15): prevents circular exploit; may suppress early exploration

Training approaches tried and their outcomes

Approach Result
Single-track PPO (Exp 9, 13) Reliable. Best per-track performance.
Round-robin close-and-switch (Wave 4, Exp 10) 80% failure rate. Disrupts PPO rollout buffer.
Parallel DummyVecEnv 90k steps (Exp 11b) ⚠️ Infrastructure works; 90k too few steps (194 steps on all tracks).
Cross-track warm start both directions (Exp 15, 16) Both failed. Single-track policies too specialised for naive transfer.

Mountain track physics (fixed 2026-04-27)

The mountain_track.unity scene assigned Slippery physics material (staticFriction=0.1) to 4 track surface colliders. WheelPhys.cs scales wheel grip by surface staticFriction, so the car had 1/5 normal grip on the hill. This caused visible wheelspin. Fixed by assigning Road material (staticFriction=0.5) to those 4 colliders in sdsim/Assets/Scenes/mountain_track.unity. The project uses a pre-built Windows executable (DonkeySimWin/donkey_sim.exe), so this fix is deferred until the sim is rebuilt from source in Unity Editor. Proceed with Exp 17 using the existing binary.

Key parameter knowledge

  • lr: 0.000725 (from Trial 9 and Exp 9 — consistent with good results)
  • throttle_min: 0.2 (v5/v6 reward gives non-zero gradient on hills even at 0.2)
  • n_steer/n_throttle: Relevant for discrete action space only (PPO uses continuous)
  • Per-env throttle_min in DummyVecEnv: Feasible — each env wrapped independently

Open Strategy (as of April 27)

The goal is reliable multi-track generalisation. The validated path forward:

  1. Exp 17: Parallel DummyVecEnv with 400k500k steps
    • Two sim instances: generated_track:9091, mountain_track:9093
    • v6 reward on both (efficiency gate + CTE patience terminator)
    • throttle_min=0.2 both envs (or optionally 0.5 on mountain, 0.2 on generated)
    • lr=0.000725, checkpoint every 20k, best_model tracked throughout
    • Eval mini_monaco zero-shot at every checkpoint
  2. If Exp 17 plateaus: Try curriculum (generated_track only for 150k, then add mountain)
  3. If still stuck: Tune v6 efficiency gate threshold (check % steps gated in early training)

See docs/TEST_HISTORY.md for full Exp 17 design.


Models Available

Model Path Status
exp13-gentrack-v4 models/exp13-gentrack-v4/best_model.zip Generated_track specialist
exp14-mountain-v5-finetune ft_036k models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip Mountain specialist (best overall mountain model)
exp14-mountain-v5 models/exp14-mountain-v5/best_model.zip Mountain base (good, slightly worse than ft_036k)
Wave 4 Trial 9 models/wave4-trial-0009/model.zip Best generalising model; unreproducible
Phase 2 champion models/champion/model.zip generated_road specialist only
Wave 4 other trials models/wave4-trial-XXXX/ Mostly crash on all tracks