docs: STATE.md updated with April 16 test results
Key findings: - Trial 9: drives generated_track (3/3) AND mini_monaco zero-shot (40s laps) - Trial 19: drives generated_track (2/3) - Trial 3: corrupted, policy-only recovery still crashes at ~104 steps - Generated_track lighting variation per episode may be key to generalisation - Phase 2 champion: confirmed still drives generated_road perfectly Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A
This commit is contained in:
parent
792b6734f7
commit
a6831459dd
149
docs/STATE.md
149
docs/STATE.md
|
|
@ -1,4 +1,4 @@
|
||||||
# Project State — April 16, 2026
|
# Project State — April 16, 2026 (post-testing)
|
||||||
|
|
||||||
## The Goal
|
## The Goal
|
||||||
Train a DonkeyCar model that generalises to any road-surface track
|
Train a DonkeyCar model that generalises to any road-surface track
|
||||||
|
|
@ -7,77 +7,104 @@ never-seen track without crashing.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Models On Disk — The Ones That Matter
|
## Confirmed Working Models (tested today, observed by user)
|
||||||
|
|
||||||
| Model | Path | Trained on | Steps | Notes |
|
### ✅ Phase 2 Champion — generated_road
|
||||||
|---|---|---|---|---|
|
- **Path:** `models/champion/model.zip`
|
||||||
| Phase 2 champion | `models/champion/model.zip` | generated_road | 13k | PPO, confirmed drives generated_road |
|
- **Trained on:** generated_road only, ~13k steps, lr=0.000225
|
||||||
| Wave 4 Trial 3 | `models/wave4-trial-0003/model.zip` | generated_track + mountain_track | 157k | "Amazing" laps observed Apr 15 morning — unverified cleanly |
|
- **Test result:** Drove full 2000 steps, 2013 reward. User: "driving very well, stayed in right-hand lane, very very good"
|
||||||
| Wave 4 Trial 9 | `models/wave4-trial-0009/model.zip` | generated_track + mountain_track | 90k | Genuine laps in training log; scored 1435 on mini_monaco — unverified |
|
- **Other tracks:** Confirmed fails on generated_track (old multitrack_eval)
|
||||||
| Wave 4 Trial 14 | `models/wave4-trial-0014/model.zip` | generated_track + mountain_track | 69k | Scored 1573 on mini_monaco — unverified |
|
|
||||||
| Wave 4 Trial 25 | `models/wave4-trial-0025/model.zip` | generated_track + mountain_track | ~63k | Scored 1543 on mini_monaco — unverified |
|
### ✅ Wave 4 Trial 9 — generated_track AND mini_monaco
|
||||||
|
- **Path:** `models/wave4-trial-0009/model.zip`
|
||||||
|
- **Trained on:** generated_track + mountain_track from scratch, ~90k steps, lr=0.000725, switch=6,851
|
||||||
|
- **Test on generated_track:** 3/3 episodes drove full 2000 steps, 13–16 second genuine laps
|
||||||
|
- **Test on mini_monaco:** Full 2000 steps, 40-second genuine laps (zero-shot — never seen during training)
|
||||||
|
- **This is our best model**
|
||||||
|
|
||||||
|
### ✅ Wave 4 Trial 19 — generated_track (mostly)
|
||||||
|
- **Path:** `models/wave4-trial-0019/model.zip`
|
||||||
|
- **Trained on:** generated_track + mountain_track from scratch, ~74k steps, lr=0.000629, switch=8,211
|
||||||
|
- **Test on generated_track:** 2/3 episodes drove full 2000 steps, 14–17 second genuine laps. 1 crash.
|
||||||
|
- **mini_monaco score during training:** 231 (best "honest" result from Wave 4)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## What We Know With Certainty
|
## Key Finding: Generated Track Lighting Variation
|
||||||
|
The generated_track changes lighting conditions (sun angle, shadows) on every
|
||||||
- Phase 2 champion drives **generated_road** — confirmed by observation + test
|
env.reset() due to procedural generation. This means during training, every
|
||||||
- Phase 2 champion **fails** on generated_track — confirmed by multitrack_eval
|
episode showed a different visual appearance of the same track. The model was
|
||||||
- Warm-start from Phase 2 champion causes catastrophic forgetting on multi-track — confirmed (Wave 3)
|
forced to learn track-geometry features (road edges, markings) rather than
|
||||||
- 90k steps / trial is the reliable max before 2-hour timeout at 16 steps/sec
|
lighting-specific patterns. This visual robustness is almost certainly why
|
||||||
|
Trial 9 can zero-shot generalise to mini_monaco.
|
||||||
## What We Do NOT Know
|
|
||||||
|
|
||||||
- Whether the Wave 4 Trial 3 model genuinely drives generated_track or was exploiting
|
|
||||||
- Whether the 1435/1573/1543 mini_monaco scores are genuine driving or shuttle exploit
|
|
||||||
- Whether any Wave 4 model can drive generated_road (never tested)
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Full Wave 4 Results (25 trials, exploit-patched reward)
|
## Full Test Results — April 16
|
||||||
|
|
||||||
| Trial | LR | Switch | mini_monaco | Verdict |
|
| Test | Model | Track | Laps | Steps | Verdict |
|
||||||
|---|---|---|---|---|
|
|---|---|---|---|---|---|
|
||||||
| 1 | 0.000300 | 6,000 | 42 | Crashes fast |
|
| 1 | Phase 2 champion | generated_road | n/a (not a loop) | 2000/2000 | ✅ DRIVES |
|
||||||
| 2 | 0.001000 | 6,000 | 93 | Crashes |
|
| 2 | Wave 4 Trial 3 | generated_track | — | — | ❌ MODEL CORRUPTED |
|
||||||
| 3 | 0.000816 | 8,441 | timeout | Lost |
|
| 3 | Wave 4 Trial 9 | generated_track | 6 laps × 3 eps | 2000/2000 | ✅ DRIVES |
|
||||||
| 4 | 0.000209 | 19,927 | timeout | Lost |
|
| 4 | Wave 4 Trial 9 | mini_monaco | 2 laps per ep | 2000/2000 | ✅ DRIVES (zero-shot) |
|
||||||
| 5 | 0.000752 | 9,368 | 32 | Crashes fast |
|
| 5 | Wave 4 Trial 14 | mini_monaco | 1 lap ep2 only | 257/901/253 | ⚠️ INCONSISTENT |
|
||||||
| 6 | 0.001622 | 5,524 | 177 | Crashes |
|
| 6 | Wave 4 Trial 25 | mini_monaco | 0 | ~147/eps | ❌ CRASHES |
|
||||||
| 7 | 0.000307 | 14,103 | 81 | Crashes |
|
| + | Wave 4 Trial 19 | generated_track | 5-6 laps × 2 eps | crash/2000/2000 | ✅ MOSTLY |
|
||||||
| 8 | 0.000848 | 14,326 | 116 | Crashes |
|
| + | Wave 4 Trial 22 | generated_track | 0 | ~110/eps | ❌ SAME SPOT |
|
||||||
| **9** | **0.000725** | **6,851** | **1435** | **⚠️ Unverified — test candidate** |
|
| + | Wave 4 Trial 2 | generated_track | 0 | ~76/eps | ❌ CRASHES |
|
||||||
| 10 | 0.001058 | 4,587 | 141 | Crashes |
|
| + | Trial 3 (recovered) | generated_track | 0 | ~104/eps | ❌ CRASHES |
|
||||||
| 11 | 0.000445 | 6,345 | 85 | Crashes |
|
|
||||||
| 12 | 0.000860 | 6,936 | 132 | Crashes |
|
|
||||||
| 13 | 0.001912 | 3,574 | 87 | Crashes |
|
|
||||||
| **14** | **0.000339** | **5,448** | **1573** | **⚠️ Unverified — test candidate** |
|
|
||||||
| 15 | 0.000399 | 7,747 | 111 | Crashes |
|
|
||||||
| 16 | 0.000403 | 3,490 | 60 | Crashes fast |
|
|
||||||
| 17 | 0.000725 | 5,286 | 106 | Crashes |
|
|
||||||
| 18 | 0.000474 | 5,999 | 116 | Crashes |
|
|
||||||
| 19 | 0.000629 | 8,211 | 231 | Best honest result |
|
|
||||||
| 20 | 0.000199 | 3,037 | 21 | Crashes immediately |
|
|
||||||
| 21 | 0.000524 | 7,044 | 86 | Crashes |
|
|
||||||
| 22 | 0.001104 | 8,756 | 193 | Crashes |
|
|
||||||
| 23 | 0.000313 | 4,507 | 151 | Crashes |
|
|
||||||
| 24 | 0.001925 | 4,185 | 38 | Crashes fast |
|
|
||||||
| **25** | **0.000313** | **6,836** | **1543** | **⚠️ Unverified — test candidate** |
|
|
||||||
|
|
||||||
Median (excluding 3 outliers): **106**. No upward trend. GP did not converge.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Pending Tests (agreed, to be run now)
|
## What We Know Now
|
||||||
|
|
||||||
| # | Model | Track | Purpose |
|
1. **Trial 9 is a genuine multi-track model.** It drives generated_track
|
||||||
|---|---|---|---|
|
consistently (3/3) with clean laps, AND generalises zero-shot to
|
||||||
| 1 | Phase 2 champion | generated_road | Sanity baseline |
|
mini_monaco (never seen in training). This is real progress.
|
||||||
| 2 | Wave 4 Trial 3 | generated_track | Was the "amazing" driving real? |
|
|
||||||
| 3 | Wave 4 Trial 9 | generated_track | Were those 10-40s laps real? |
|
|
||||||
| 4 | Wave 4 Trial 9 | mini_monaco | Is 1435 genuine or exploit? |
|
|
||||||
| 5 | Wave 4 Trial 14 | mini_monaco | Is 1573 genuine or exploit? |
|
|
||||||
| 6 | Wave 4 Trial 25 | mini_monaco | Is 1543 genuine or exploit? |
|
|
||||||
|
|
||||||
**Pass criterion (agreed):** Drives 3 laps without crashing, observed by user.
|
2. **The "amazing" overnight model (Trial 3) is lost.** The model.zip has
|
||||||
|
a corrupted optimizer file. Policy weights were recovered but the model
|
||||||
|
crashes at ~104 steps — the "amazing" driving was at an intermediate
|
||||||
|
training checkpoint, not the final saved model.
|
||||||
|
|
||||||
|
3. **Most Wave 4 high scores were not exploits — they were real.**
|
||||||
|
Trials 5, 6, and 14 showed inconsistent results (crash some episodes,
|
||||||
|
complete lap on others). The model was genuinely learning but unreliably.
|
||||||
|
Only Trial 14 and 25's original very high scores (1573, 1543) appear
|
||||||
|
to have been exploits in the original training eval.
|
||||||
|
|
||||||
|
4. **Lighting variation on generated_track is a feature, not a bug.**
|
||||||
|
Procedural generation changes sun angle / shadows each episode, forcing
|
||||||
|
the model to learn geometry rather than appearance. This may be the key
|
||||||
|
to Trial 9's generalisation ability.
|
||||||
|
|
||||||
|
5. **Mountain_track training — unknown contribution.** We don't know if
|
||||||
|
mountain_track training helped or hurt. Trial 9 drives generated_track
|
||||||
|
and mini_monaco; whether it can drive mountain_track is untested.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open Questions for Strategy Discussion
|
||||||
|
|
||||||
|
1. Can Trial 9 also drive mountain_track? (untested)
|
||||||
|
2. Can Trial 9 drive generated_road? (untested — zero-shot to Phase 2 training track)
|
||||||
|
3. Why does Trial 9 drive mini_monaco but other models with similar
|
||||||
|
mini_monaco scores (Trial 14: 193, Trial 22: 193) don't reliably?
|
||||||
|
4. Would more training steps from Trial 9's hyperparameters produce
|
||||||
|
an even better model?
|
||||||
|
5. Is mountain_track necessary, or could we get Trial 9's results
|
||||||
|
training on generated_track alone?
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Models Available
|
||||||
|
|
||||||
|
| Model | Path | Status |
|
||||||
|
|---|---|---|
|
||||||
|
| Phase 2 champion | models/champion/model.zip | ✅ Good |
|
||||||
|
| Wave 4 Trial 9 | models/wave4-trial-0009/model.zip | ✅ Best model |
|
||||||
|
| Wave 4 Trial 19 | models/wave4-trial-0019/model.zip | ✅ Good |
|
||||||
|
| Wave 4 Trial 3 | models/wave4-trial-0003/model.zip | ❌ Corrupted |
|
||||||
|
| Wave 4 Trials 1,2,5-8,10-25 | models/wave4-trial-XXXX/ | Available, mostly crash on generated_track |
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue