52 lines
5.6 KiB
Plaintext
52 lines
5.6 KiB
Plaintext
[2026-04-13 19:33:13] ============================================================
|
|
[2026-04-13 19:33:13] [AutoResearch] Phase 1 — Real PPO Training + GP+UCB Optimization
|
|
[2026-04-13 19:33:13] [AutoResearch] Max trials: 20 | kappa: 2.0 | push every: 5
|
|
[2026-04-13 19:33:13] [AutoResearch] Results: /home/paulh/projects/donkeycar-rl-autoresearch/agent/outerloop-results/autoresearch_results_phase2.jsonl
|
|
[2026-04-13 19:33:13] [AutoResearch] Champion: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion
|
|
[2026-04-13 19:33:13] ============================================================
|
|
[2026-04-13 19:33:13] [AutoResearch] Loaded 0 existing Phase 1 results.
|
|
[2026-04-13 19:33:13] [AutoResearch] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True}
|
|
[2026-04-13 19:33:13]
|
|
[AutoResearch] ========== Trial 1/20 ==========
|
|
[2026-04-13 19:33:13] [AutoResearch] Only 0 results — using random proposal.
|
|
[2026-04-13 19:33:13] [AutoResearch] Proposed: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True}
|
|
[2026-04-13 19:33:15] [AutoResearch] Launching trial 1: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True}
|
|
[2026-04-13 20:05:03] [AutoResearch] Trial 1 finished in 1908.3s, returncode=0
|
|
[2026-04-13 20:05:03] [AutoResearch] Trial 1: mean_reward=234.5386 std_reward=3.1547
|
|
[2026-04-13 20:05:03] [AutoResearch] === Trial 1 Summary ===
|
|
[2026-04-13 20:05:03] Total Phase 1 runs: 1
|
|
[2026-04-13 20:05:03] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True}
|
|
[2026-04-13 20:05:03] Top 5:
|
|
[2026-04-13 20:05:03] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True}
|
|
[2026-04-13 20:05:05]
|
|
[AutoResearch] ========== Trial 2/20 ==========
|
|
[2026-04-13 20:05:05] [AutoResearch] Only 1 results — using random proposal.
|
|
[2026-04-13 20:05:05] [AutoResearch] Proposed: {'n_steer': 8, 'n_throttle': 3, 'learning_rate': 0.0012285179829782996, 'timesteps': 39101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True}
|
|
[2026-04-13 20:05:07] [AutoResearch] Launching trial 2: {'n_steer': 8, 'n_throttle': 3, 'learning_rate': 0.0012285179829782996, 'timesteps': 39101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True}
|
|
[2026-04-13 20:55:43] [AutoResearch] GP UCB top-5 candidates:
|
|
[2026-04-13 20:55:43] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173}
|
|
[2026-04-13 20:55:43] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198}
|
|
[2026-04-13 20:55:43] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887}
|
|
[2026-04-13 20:55:43] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199}
|
|
[2026-04-13 20:55:43] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035}
|
|
[2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5}
|
|
[2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7}
|
|
[2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50}
|
|
[2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80}
|
|
[2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90}
|
|
[2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8}
|
|
[2026-04-13 20:55:43] [AutoResearch] Only 1 results — using random proposal.
|
|
[2026-04-13 20:55:59] [AutoResearch] GP UCB top-5 candidates:
|
|
[2026-04-13 20:55:59] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173}
|
|
[2026-04-13 20:55:59] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198}
|
|
[2026-04-13 20:55:59] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887}
|
|
[2026-04-13 20:55:59] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199}
|
|
[2026-04-13 20:55:59] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035}
|
|
[2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5}
|
|
[2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7}
|
|
[2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50}
|
|
[2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80}
|
|
[2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90}
|
|
[2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8}
|
|
[2026-04-13 20:55:59] [AutoResearch] Only 1 results — using random proposal.
|