Exp 11 (v5 reward): aborted at 66k — circular driving returned without efficiency term Exp 11b (v6 reward): completed 90k — no circles but plateaus at 170-195 steps All 4 tracks eval: remarkably consistent ~194 steps (including zero-shot) Parallel DummyVecEnv infrastructure proven stable. Next: increase training budget (90k may be insufficient for 2 parallel envs). |
||
|---|---|---|
| .. | ||
| track-screenshots | ||
| ARCHITECTURE.md | ||
| RESEARCH_LOG.md | ||
| SESSION_LOG_2026-04-19.md | ||
| STATE.md | ||
| TEST_HISTORY.md | ||