docs: Exp10 eval results — total failure, crashes on all tracks (massive regression from Exp9/W4T9)
This commit is contained in:
parent
6e9546cd22
commit
3d04b53a86
|
|
@ -0,0 +1,38 @@
|
|||
[10:15:15] Model: models/exp10-two-tracks/best_model.zip
|
||||
[10:15:15] Sets: 3
|
||||
[10:15:15] Max steps:2000
|
||||
[10:15:15] Log file: /home/paulh/projects/donkeycar-rl-autoresearch/agent/test-results/2026-04-19_10-15_exp10-two-tracks.log
|
||||
[10:15:15]
|
||||
==================================================
|
||||
[10:15:15] SET 1 of 3
|
||||
[10:15:15] ==================================================
|
||||
[10:15:32] Set1 mountain_track : 178 steps 12.1 reward ❌ crash@178
|
||||
[10:15:47] Set1 generated_track : 99 steps 7.2 reward ❌ crash@99
|
||||
[10:16:03] Set1 generated_road : 135 steps 11.1 reward ❌ crash@135
|
||||
[10:16:18] Set1 mini_monaco : 111 steps 5.2 reward ❌ crash@111
|
||||
[10:16:20]
|
||||
==================================================
|
||||
[10:16:20] SET 2 of 3
|
||||
[10:16:20] ==================================================
|
||||
[10:16:34] Set2 mountain_track : 179 steps 11.2 reward ❌ crash@179
|
||||
[10:16:49] Set2 generated_track : 82 steps 6.1 reward ❌ crash@82
|
||||
[10:17:06] Set2 generated_road : 223 steps 29.8 reward ❌ crash@223
|
||||
[10:17:22] Set2 mini_monaco : 133 steps 6.4 reward ❌ crash@133
|
||||
[10:17:24]
|
||||
==================================================
|
||||
[10:17:24] SET 3 of 3
|
||||
[10:17:24] ==================================================
|
||||
[10:17:38] Set3 mountain_track : 179 steps 11.9 reward ❌ crash@179
|
||||
[10:17:53] Set3 generated_track : 88 steps 5.6 reward ❌ crash@88
|
||||
[10:18:08] Set3 generated_road : 105 steps 7.0 reward ❌ crash@105
|
||||
[10:18:24] Set3 mini_monaco : 129 steps 5.9 reward ❌ crash@129
|
||||
[10:18:26]
|
||||
==================================================
|
||||
[10:18:26] SUMMARY (3 sets, max 2000 steps per run)
|
||||
[10:18:26] ==================================================
|
||||
[10:18:26] ❌ mountain_track : 178/179/179 mean=179
|
||||
[10:18:26] ❌ generated_track : 99/82/88 mean=90
|
||||
[10:18:26] ❌ generated_road : 135/223/105 mean=154
|
||||
[10:18:26] ❌ mini_monaco : 111/133/129 mean=124
|
||||
[10:18:26]
|
||||
Full log saved to: /home/paulh/projects/donkeycar-rl-autoresearch/agent/test-results/2026-04-19_10-15_exp10-two-tracks.log
|
||||
|
|
@ -240,3 +240,35 @@ Goal: model that is reliable on both training tracks, then test generalisation t
|
|||
generated_road improved, mini_monaco TBD
|
||||
- **This is essentially Trial 9 repeated with:** v5 reward + throttle_min=0.2 + proper checkpointing + exploit fix
|
||||
|
||||
### Exp 10 — Evaluation Results (3-set test, 2026-04-19)
|
||||
|
||||
**Model tested:** `models/exp10-two-tracks/best_model.zip`
|
||||
**Result: TOTAL FAILURE — crashes on every track, every set.**
|
||||
|
||||
| Track | Set 1 | Set 2 | Set 3 | Mean | Verdict |
|
||||
|---|---|---|---|---|---|
|
||||
| mountain_track (trained) | 178 | 179 | 179 | **179** | ❌ Crashes at same spot every time |
|
||||
| generated_track (trained) | 99 | 82 | 88 | **90** | ❌ Crashes almost immediately |
|
||||
| generated_road (zero-shot) | 135 | 223 | 105 | **154** | ❌ Crashes early |
|
||||
| mini_monaco (zero-shot) | 111 | 133 | 129 | **124** | ❌ Crashes early |
|
||||
|
||||
**Comparison to previous best models:**
|
||||
- Exp 9 (mountain only): mountain_track was 2000/2000 every time → now 179. **91% regression.**
|
||||
- Wave 4 Trial 9 (generated_track + mountain_track via autoresearch): generated_track 2000/2000, mini_monaco 2000/2000 → now 90 and 124.
|
||||
|
||||
**Analysis:**
|
||||
- The round-robin track switching every 6,000 steps via `multitrack_runner.train_multitrack()`
|
||||
produced a model that learned NEITHER track. This is catastrophic interference.
|
||||
- Wave 4 Trial 9 used the same two tracks but via the autoresearch controller with different
|
||||
hyperparameters (switch=6,851, lr=0.000725, 90k steps). The key difference is likely in
|
||||
HOW the environment switching works — `multitrack_runner` closes and reopens envs,
|
||||
potentially disrupting PPO's rollout buffer and value function estimates.
|
||||
- Mountain_track crashes at exactly step 178-179 in all 3 sets — suggests the model has
|
||||
learned a fixed degenerate policy (always turn one direction) rather than responding to vision.
|
||||
|
||||
**Key question:** Why did Wave 4 Trial 9 succeed with similar parameters but Exp 10 failed?
|
||||
Possible causes: (1) env close/reopen resets PPO internal state, (2) `best_model` selection
|
||||
criteria differs, (3) multitrack_runner wrapping chain differs from autoresearch controller.
|
||||
|
||||
**Full log:** `agent/test-results/2026-04-19_10-15_exp10-two-tracks.log`
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue