donkeycar-rl-autoresearch

Commit Graph

Author	SHA1	Message	Date
Paul Huliganga	298cd1790a	fix: LR override was not reaching the optimizer — all trials ran at 0.000225 PPO.load() restores the saved optimizer state (lr=0.000225 from Phase 2 champion). Setting model.learning_rate alone is insufficient because _update_learning_rate() may not fire before the first gradient step, and the optimizer's param_groups still hold the old value. Fix: after PPO.load(), explicitly set lr on every optimizer param_group: model.learning_rate = lr for pg in model.policy.optimizer.param_groups: pg['lr'] = lr Impact: all 8 previous Wave 3 trials actually trained at LR=0.000225 regardless of GP proposal. Results archived as: autoresearch_results_phase3_CONTAMINATED_wrong_lr.jsonl Phase 3 results cleared; autoresearch restarting from scratch. Agent: pi Tests: 83 passed Tests-Added: 0 TypeScript: N/A	2026-04-14 20:37:48 -04:00
Paul Huliganga	2a747bb97c	wave3: autoresearch trial 5 results Agent: pi Tests: N/A Tests-Added: 0 TypeScript: N/A	2026-04-14 18:22:44 -04:00
Paul Huliganga	349396f967	fix: stream runner output in real-time instead of buffering Replace subprocess.run(capture_output=True) with Popen + line-by-line iteration so every line from multitrack_runner.py appears in the nohup log immediately rather than only after the trial completes (~35-90 min). - stdout/stderr merged via stderr=STDOUT - line-buffered (bufsize=1) - deadline-based timeout replaces subprocess timeout kwarg - output accumulated in list for parse_runner_output() as before Agent: pi Tests: 30 passed Tests-Added: 0 TypeScript: N/A	2026-04-14 15:13:10 -04:00

Author

SHA1

Message

Date

Paul Huliganga

298cd1790a

fix: LR override was not reaching the optimizer — all trials ran at 0.000225

PPO.load() restores the saved optimizer state (lr=0.000225 from Phase 2
champion).  Setting model.learning_rate alone is insufficient because
_update_learning_rate() may not fire before the first gradient step, and
the optimizer's param_groups still hold the old value.

Fix: after PPO.load(), explicitly set lr on every optimizer param_group:
  model.learning_rate = lr
  for pg in model.policy.optimizer.param_groups:
      pg['lr'] = lr

Impact: all 8 previous Wave 3 trials actually trained at LR=0.000225
regardless of GP proposal.  Results archived as:
  autoresearch_results_phase3_CONTAMINATED_wrong_lr.jsonl
Phase 3 results cleared; autoresearch restarting from scratch.

Agent: pi
Tests: 83 passed
Tests-Added: 0
TypeScript: N/A

2026-04-14 20:37:48 -04:00

Paul Huliganga

2a747bb97c

wave3: autoresearch trial 5 results

Agent: pi
Tests: N/A
Tests-Added: 0
TypeScript: N/A

2026-04-14 18:22:44 -04:00

Paul Huliganga

349396f967

fix: stream runner output in real-time instead of buffering

Replace subprocess.run(capture_output=True) with Popen + line-by-line
iteration so every line from multitrack_runner.py appears in the nohup
log immediately rather than only after the trial completes (~35-90 min).

- stdout/stderr merged via stderr=STDOUT
- line-buffered (bufsize=1)
- deadline-based timeout replaces subprocess timeout kwarg
- output accumulated in list for parse_runner_output() as before

Agent: pi
Tests: 30 passed
Tests-Added: 0
TypeScript: N/A

2026-04-14 15:13:10 -04:00

3 Commits