PPO.load() restores the saved optimizer state (lr=0.000225 from Phase 2
champion). Setting model.learning_rate alone is insufficient because
_update_learning_rate() may not fire before the first gradient step, and
the optimizer's param_groups still hold the old value.
Fix: after PPO.load(), explicitly set lr on every optimizer param_group:
model.learning_rate = lr
for pg in model.policy.optimizer.param_groups:
pg['lr'] = lr
Impact: all 8 previous Wave 3 trials actually trained at LR=0.000225
regardless of GP proposal. Results archived as:
autoresearch_results_phase3_CONTAMINATED_wrong_lr.jsonl
Phase 3 results cleared; autoresearch restarting from scratch.
Agent: pi
Tests: 83 passed
Tests-Added: 0
TypeScript: N/A
Replace subprocess.run(capture_output=True) with Popen + line-by-line
iteration so every line from multitrack_runner.py appears in the nohup
log immediately rather than only after the trial completes (~35-90 min).
- stdout/stderr merged via stderr=STDOUT
- line-buffered (bufsize=1)
- deadline-based timeout replaces subprocess timeout kwarg
- output accumulated in list for parse_runner_output() as before
Agent: pi
Tests: 30 passed
Tests-Added: 0
TypeScript: N/A