From 792b6734f7aa009640c920b7f30f5f6ef0a5b0ab Mon Sep 17 00:00:00 2001 From: Paul Huliganga Date: Thu, 16 Apr 2026 20:17:41 -0400 Subject: [PATCH] =?UTF-8?q?docs:=20STATE.md=20=E2=80=94=20full=20project?= =?UTF-8?q?=20state=20as=20of=20April=2016=20end=20of=20Wave=204?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Documents all 25 trial results, known models, what is confirmed vs unknown, and the 6 pending verification tests agreed with user. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A --- docs/STATE.md | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) create mode 100644 docs/STATE.md diff --git a/docs/STATE.md b/docs/STATE.md new file mode 100644 index 0000000..17cb558 --- /dev/null +++ b/docs/STATE.md @@ -0,0 +1,83 @@ +# Project State — April 16, 2026 + +## The Goal +Train a DonkeyCar model that generalises to any road-surface track +(outdoor, asphalt, lane markings) — demonstrated by driving a +never-seen track without crashing. + +--- + +## Models On Disk — The Ones That Matter + +| Model | Path | Trained on | Steps | Notes | +|---|---|---|---|---| +| Phase 2 champion | `models/champion/model.zip` | generated_road | 13k | PPO, confirmed drives generated_road | +| Wave 4 Trial 3 | `models/wave4-trial-0003/model.zip` | generated_track + mountain_track | 157k | "Amazing" laps observed Apr 15 morning — unverified cleanly | +| Wave 4 Trial 9 | `models/wave4-trial-0009/model.zip` | generated_track + mountain_track | 90k | Genuine laps in training log; scored 1435 on mini_monaco — unverified | +| Wave 4 Trial 14 | `models/wave4-trial-0014/model.zip` | generated_track + mountain_track | 69k | Scored 1573 on mini_monaco — unverified | +| Wave 4 Trial 25 | `models/wave4-trial-0025/model.zip` | generated_track + mountain_track | ~63k | Scored 1543 on mini_monaco — unverified | + +--- + +## What We Know With Certainty + +- Phase 2 champion drives **generated_road** — confirmed by observation + test +- Phase 2 champion **fails** on generated_track — confirmed by multitrack_eval +- Warm-start from Phase 2 champion causes catastrophic forgetting on multi-track — confirmed (Wave 3) +- 90k steps / trial is the reliable max before 2-hour timeout at 16 steps/sec + +## What We Do NOT Know + +- Whether the Wave 4 Trial 3 model genuinely drives generated_track or was exploiting +- Whether the 1435/1573/1543 mini_monaco scores are genuine driving or shuttle exploit +- Whether any Wave 4 model can drive generated_road (never tested) + +--- + +## Full Wave 4 Results (25 trials, exploit-patched reward) + +| Trial | LR | Switch | mini_monaco | Verdict | +|---|---|---|---|---| +| 1 | 0.000300 | 6,000 | 42 | Crashes fast | +| 2 | 0.001000 | 6,000 | 93 | Crashes | +| 3 | 0.000816 | 8,441 | timeout | Lost | +| 4 | 0.000209 | 19,927 | timeout | Lost | +| 5 | 0.000752 | 9,368 | 32 | Crashes fast | +| 6 | 0.001622 | 5,524 | 177 | Crashes | +| 7 | 0.000307 | 14,103 | 81 | Crashes | +| 8 | 0.000848 | 14,326 | 116 | Crashes | +| **9** | **0.000725** | **6,851** | **1435** | **⚠️ Unverified — test candidate** | +| 10 | 0.001058 | 4,587 | 141 | Crashes | +| 11 | 0.000445 | 6,345 | 85 | Crashes | +| 12 | 0.000860 | 6,936 | 132 | Crashes | +| 13 | 0.001912 | 3,574 | 87 | Crashes | +| **14** | **0.000339** | **5,448** | **1573** | **⚠️ Unverified — test candidate** | +| 15 | 0.000399 | 7,747 | 111 | Crashes | +| 16 | 0.000403 | 3,490 | 60 | Crashes fast | +| 17 | 0.000725 | 5,286 | 106 | Crashes | +| 18 | 0.000474 | 5,999 | 116 | Crashes | +| 19 | 0.000629 | 8,211 | 231 | Best honest result | +| 20 | 0.000199 | 3,037 | 21 | Crashes immediately | +| 21 | 0.000524 | 7,044 | 86 | Crashes | +| 22 | 0.001104 | 8,756 | 193 | Crashes | +| 23 | 0.000313 | 4,507 | 151 | Crashes | +| 24 | 0.001925 | 4,185 | 38 | Crashes fast | +| **25** | **0.000313** | **6,836** | **1543** | **⚠️ Unverified — test candidate** | + +Median (excluding 3 outliers): **106**. No upward trend. GP did not converge. + +--- + +## Pending Tests (agreed, to be run now) + +| # | Model | Track | Purpose | +|---|---|---|---| +| 1 | Phase 2 champion | generated_road | Sanity baseline | +| 2 | Wave 4 Trial 3 | generated_track | Was the "amazing" driving real? | +| 3 | Wave 4 Trial 9 | generated_track | Were those 10-40s laps real? | +| 4 | Wave 4 Trial 9 | mini_monaco | Is 1435 genuine or exploit? | +| 5 | Wave 4 Trial 14 | mini_monaco | Is 1573 genuine or exploit? | +| 6 | Wave 4 Trial 25 | mini_monaco | Is 1543 genuine or exploit? | + +**Pass criterion (agreed):** Drives 3 laps without crashing, observed by user. +