Exp 11 (v5 reward): aborted at 66k — circular driving returned without efficiency term
Exp 11b (v6 reward): completed 90k — no circles but plateaus at 170-195 steps
All 4 tracks eval: remarkably consistent ~194 steps (including zero-shot)
Parallel DummyVecEnv infrastructure proven stable.
Next: increase training budget (90k may be insufficient for 2 parallel envs).