Strategy change driven by Trial 1 data analysis:
- generated_road removed: too similar to generated_track, and Phase-2
warm-start caused catastrophic forgetting (reward 2388→37 in one rotation)
- mountain_track mean reward was only 17 — model never converged there
- mini_monaco score 24.9 (37 steps) — model was outputting degenerate actions
Wave 4 approach:
- NO warm-start: fresh random weights every trial
- Train: generated_track + mountain_track (visually distinct backgrounds,
both have road markings — forces model to learn general mark-following)
- Test (zero-shot): mini_monaco only (never seen during training)
- Wider LR search: [1e-4, 2e-3] (scratch model needs different range)
- Larger step budgets: 60k-250k total (fresh model needs more time)
- Seed params: lr=0.0003 and lr=0.001 (diverse from the start)
Files:
- multitrack_runner.py: 2 training tracks, no warm-start auto-detection
- wave4_controller.py: new Wave 4 GP+UCB controller
- tests updated: TRAINING_TRACKS assertion, seed param tests → wave4
- 96 tests passing
ADR-013 to follow.
Agent: pi
Tests: 96 passed
Tests-Added: 0
TypeScript: N/A
PPO.load() restores the saved optimizer state (lr=0.000225 from Phase 2
champion). Setting model.learning_rate alone is insufficient because
_update_learning_rate() may not fire before the first gradient step, and
the optimizer's param_groups still hold the old value.
Fix: after PPO.load(), explicitly set lr on every optimizer param_group:
model.learning_rate = lr
for pg in model.policy.optimizer.param_groups:
pg['lr'] = lr
Impact: all 8 previous Wave 3 trials actually trained at LR=0.000225
regardless of GP proposal. Results archived as:
autoresearch_results_phase3_CONTAMINATED_wrong_lr.jsonl
Phase 3 results cleared; autoresearch restarting from scratch.
Agent: pi
Tests: 83 passed
Tests-Added: 0
TypeScript: N/A
Replace subprocess.run(capture_output=True) with Popen + line-by-line
iteration so every line from multitrack_runner.py appears in the nohup
log immediately rather than only after the trial completes (~35-90 min).
- stdout/stderr merged via stderr=STDOUT
- line-buffered (bufsize=1)
- deadline-based timeout replaces subprocess timeout kwarg
- output accumulated in list for parse_runner_output() as before
Agent: pi
Tests: 30 passed
Tests-Added: 0
TypeScript: N/A