docs: Exp 11 + 11b results — parallel envs work, v6 prevents circles, but plateaus at ~194 steps
Exp 11 (v5 reward): aborted at 66k — circular driving returned without efficiency term Exp 11b (v6 reward): completed 90k — no circles but plateaus at 170-195 steps All 4 tracks eval: remarkably consistent ~194 steps (including zero-shot) Parallel DummyVecEnv infrastructure proven stable. Next: increase training budget (90k may be insufficient for 2 parallel envs).
This commit is contained in:
parent
91ce8fc1fa
commit
0993d4f1e7
|
|
@ -114,7 +114,13 @@ parallel envs are working.
|
|||
| mini_monaco (zero-shot) | 111 | 133 | 129 | **124** | ❌ Crashes early |
|
||||
|
||||
## Next Steps
|
||||
- **Exp 11:** Test parallel DummyVecEnv with two sim instances (ports 9091 + 9093)
|
||||
- First: verify we can connect to both sims simultaneously
|
||||
- Then: train with both tracks in parallel, same hyperparameters as Trial 9
|
||||
- Goal: consistent results (not lottery), measured over multiple runs
|
||||
- **Exp 11:** Tested parallel DummyVecEnv with two sim instances (ports 9091 + 9093)
|
||||
- Exp 11 (v5 reward): aborted due to circular driving on generated_track
|
||||
- Exp 11b (v6 reward): completed, no circles, but plateaus at ~194 steps on all tracks
|
||||
- **v6 reward confirmed:** efficiency gate prevents circles, tests pass
|
||||
- **Parallel env confirmed:** mechanically sound, stable training
|
||||
- **Open issue:** 90k steps may be insufficient for 2-env training (45k per track)
|
||||
- **Next experiment ideas:**
|
||||
- Increase to 180k-250k total steps
|
||||
- Test v6 on single track to isolate reward effect
|
||||
- Check if efficiency gate fires during normal cornering (false positives)
|
||||
|
|
|
|||
|
|
@ -334,3 +334,69 @@ the track-switching was the problem. Result stored in `models/wave5-gentrack-onl
|
|||
should we focus on single-track training with domain randomization (lighting,
|
||||
camera angle) to achieve generalization instead?
|
||||
|
||||
### Exp 11 — Parallel DummyVecEnv, v5 reward (ABORTED)
|
||||
- **Date:** 2026-04-19
|
||||
- **Change from Exp10:** Two sim instances (port 9091 + 9093), DummyVecEnv wraps both.
|
||||
PPO sees both tracks in every rollout batch. No close_and_switch.
|
||||
- **Tracks:** generated_track (9091) + mountain_track (9093)
|
||||
- **Reward:** v5 (speed × CTE) — same as Exp 9/10
|
||||
- **Result:** ABORTED at 66k/90k steps. Circular driving observed on generated_track.
|
||||
v5 reward has no efficiency term → circles at CTE≈0 earn positive reward.
|
||||
- **Positive:** Parallel env infrastructure works! Both sims connected, PPO trained
|
||||
stably with no env switching issues. Consistent improvement 14.7→67.8 combined.
|
||||
- **Negative:** Circular driving exploit returned because v5 dropped efficiency.
|
||||
|
||||
### Exp 11b — Parallel DummyVecEnv, v6 reward (anti-circle gate)
|
||||
- **Date:** 2026-04-19
|
||||
- **Change from Exp11:** Reward v6 (speed × CTE + efficiency gate ≥ 0.15).
|
||||
Also stuck_steps 80→40 (faster stuck termination).
|
||||
- **Tracks:** generated_track (9091) + mountain_track (9093)
|
||||
- **Total steps:** 90,000 | lr=0.000725 | throttle_min=0.2
|
||||
|
||||
**Training progress (eval at each 6k checkpoint):**
|
||||
|
||||
| Steps | gen_track | mountain | Combined | Note |
|
||||
|---|---|---|---|---|
|
||||
| 6k | 91s | 130s | 10.7r | Early |
|
||||
| 18k | 100s | 100s | 15.9r | Improving |
|
||||
| 36k | 161s | 160s | 26.2r | ⭐ |
|
||||
| 42k | 160s | 159s | 28.9r | ⭐ |
|
||||
| 60k | 164s | 163s | — | Plateau |
|
||||
| 78k | 169s | 168s | 29.2r | ⭐ |
|
||||
| 90k | 173s | 172s | — | End |
|
||||
|
||||
**Evaluation results (best_model, 3 sets per track):**
|
||||
|
||||
| Track | Set 1 | Set 2 | Set 3 | Mean | Verdict |
|
||||
|---|---|---|---|---|---|
|
||||
| mountain_track (trained) | 195 | 196 | 192 | **194** | ❌ |
|
||||
| generated_track (trained) | 192 | 194 | 192 | **193** | ❌ |
|
||||
| generated_road (zero-shot) | 192 | 196 | 194 | **194** | ❌ |
|
||||
| mini_monaco (zero-shot) | 194 | 192 | 196 | **194** | ❌ |
|
||||
|
||||
**Analysis:**
|
||||
- ✅ No circular driving (efficiency gate works)
|
||||
- ✅ Remarkably consistent: all tracks ~194 steps, very low variance
|
||||
- ✅ Parallel env infrastructure is stable and reliable
|
||||
- ❌ Model plateaus at ~170-195 steps and never improves past that
|
||||
- ❌ Much worse than Exp 9 (mountain only: 2000/2000) or Wave 4 Trial 9 (2000/2000)
|
||||
- The consistency across all 4 tracks (including zero-shot) suggests the model
|
||||
learned a generic short-drive policy, not track-specific features
|
||||
- Possible cause: 90k steps may be insufficient for 2-env parallel training
|
||||
(effective steps per track = 45k each), or the efficiency gate may be
|
||||
suppressing early exploration
|
||||
|
||||
**Key findings:**
|
||||
1. Parallel DummyVecEnv works mechanically — this is the right infrastructure
|
||||
2. v6 reward prevents circular driving
|
||||
3. But 90k steps with 2 parallel envs may not be enough training budget
|
||||
4. Compare: Exp 9 (single track, 90k steps, v5) → 2000 steps. Exp 11b
|
||||
(2 tracks, 90k steps, v6) → 194 steps. The training budget per track
|
||||
is halved AND the reward is harder to exploit.
|
||||
|
||||
**Next experiments to consider:**
|
||||
- Increase total_timesteps to 180k-250k (restore per-track budget)
|
||||
- Try v6 reward on single track first to isolate reward vs multi-track effects
|
||||
- Try v5 reward with parallel envs but longer training (accept some circling)
|
||||
- Check if efficiency gate triggers too aggressively during normal cornering
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue