[2026-04-13 19:33:13] ============================================================ [2026-04-13 19:33:13] [AutoResearch] Phase 1 — Real PPO Training + GP+UCB Optimization [2026-04-13 19:33:13] [AutoResearch] Max trials: 20 | kappa: 2.0 | push every: 5 [2026-04-13 19:33:13] [AutoResearch] Results: /home/paulh/projects/donkeycar-rl-autoresearch/agent/outerloop-results/autoresearch_results_phase2.jsonl [2026-04-13 19:33:13] [AutoResearch] Champion: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion [2026-04-13 19:33:13] ============================================================ [2026-04-13 19:33:13] [AutoResearch] Loaded 0 existing Phase 1 results. [2026-04-13 19:33:13] [AutoResearch] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-13 19:33:13] [AutoResearch] ========== Trial 1/20 ========== [2026-04-13 19:33:13] [AutoResearch] Only 0 results — using random proposal. [2026-04-13 19:33:13] [AutoResearch] Proposed: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 19:33:15] [AutoResearch] Launching trial 1: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 20:05:03] [AutoResearch] Trial 1 finished in 1908.3s, returncode=0 [2026-04-13 20:05:03] [AutoResearch] Trial 1: mean_reward=234.5386 std_reward=3.1547 [2026-04-13 20:05:03] [AutoResearch] === Trial 1 Summary === [2026-04-13 20:05:03] Total Phase 1 runs: 1 [2026-04-13 20:05:03] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-13 20:05:03] Top 5: [2026-04-13 20:05:03] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 20:05:05] [AutoResearch] ========== Trial 2/20 ========== [2026-04-13 20:05:05] [AutoResearch] Only 1 results — using random proposal. [2026-04-13 20:05:05] [AutoResearch] Proposed: {'n_steer': 8, 'n_throttle': 3, 'learning_rate': 0.0012285179829782996, 'timesteps': 39101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 20:05:07] [AutoResearch] Launching trial 2: {'n_steer': 8, 'n_throttle': 3, 'learning_rate': 0.0012285179829782996, 'timesteps': 39101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 20:55:43] [AutoResearch] GP UCB top-5 candidates: [2026-04-13 20:55:43] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-13 20:55:43] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-13 20:55:43] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-13 20:55:43] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-13 20:55:43] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-13 20:55:43] [AutoResearch] Only 1 results — using random proposal. [2026-04-13 20:55:59] [AutoResearch] GP UCB top-5 candidates: [2026-04-13 20:55:59] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-13 20:55:59] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-13 20:55:59] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-13 20:55:59] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-13 20:55:59] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-13 20:55:59] [AutoResearch] Only 1 results — using random proposal. [2026-04-13 20:56:39] ============================================================ [2026-04-13 20:56:39] [AutoResearch] Phase 1 — Real PPO Training + GP+UCB Optimization [2026-04-13 20:56:39] [AutoResearch] Max trials: 20 | kappa: 2.0 | push every: 5 [2026-04-13 20:56:39] [AutoResearch] Results: /home/paulh/projects/donkeycar-rl-autoresearch/agent/outerloop-results/autoresearch_results_phase2.jsonl [2026-04-13 20:56:39] [AutoResearch] Champion: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion [2026-04-13 20:56:39] ============================================================ [2026-04-13 20:56:39] [AutoResearch] Loaded 1 existing Phase 1 results. [2026-04-13 20:56:39] [AutoResearch] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-13 20:56:39] [AutoResearch] ========== Trial 1/20 ========== [2026-04-13 20:56:39] [AutoResearch] Only 1 results — using random proposal. [2026-04-13 20:56:39] [AutoResearch] Proposed: {'n_steer': 4, 'n_throttle': 2, 'learning_rate': 0.0016410214223984076, 'timesteps': 16101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 20:56:41] [AutoResearch] Launching trial 1: {'n_steer': 4, 'n_throttle': 2, 'learning_rate': 0.0016410214223984076, 'timesteps': 16101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:09:13] [AutoResearch] Trial 1 finished in 751.5s, returncode=0 [2026-04-13 21:09:13] [AutoResearch] Trial 1: mean_reward=177.7416 std_reward=142.3977 [2026-04-13 21:09:13] [AutoResearch] === Trial 1 Summary === [2026-04-13 21:09:13] Total Phase 1 runs: 2 [2026-04-13 21:09:13] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-13 21:09:13] Top 5: [2026-04-13 21:09:13] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:09:13] mean_reward=177.7416 params={'n_steer': 4, 'n_throttle': 2, 'learning_rate': 0.0016410214223984076, 'timesteps': 16101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:09:15] [AutoResearch] ========== Trial 2/20 ========== [2026-04-13 21:09:15] [AutoResearch] Only 2 results — using random proposal. [2026-04-13 21:09:15] [AutoResearch] Proposed: {'n_steer': 8, 'n_throttle': 2, 'learning_rate': 0.0012716386940916763, 'timesteps': 40768, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:09:17] [AutoResearch] Launching trial 2: {'n_steer': 8, 'n_throttle': 2, 'learning_rate': 0.0012716386940916763, 'timesteps': 40768, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:32:22] [AutoResearch] Trial 2 finished in 1384.9s, returncode=0 [2026-04-13 21:32:22] [AutoResearch] Trial 2: mean_reward=38.1267 std_reward=0.3364 [2026-04-13 21:32:22] [AutoResearch] === Trial 2 Summary === [2026-04-13 21:32:22] Total Phase 1 runs: 3 [2026-04-13 21:32:22] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-13 21:32:22] Top 5: [2026-04-13 21:32:22] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:32:22] mean_reward=177.7416 params={'n_steer': 4, 'n_throttle': 2, 'learning_rate': 0.0016410214223984076, 'timesteps': 16101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:32:22] mean_reward=38.1267 params={'n_steer': 8, 'n_throttle': 2, 'learning_rate': 0.0012716386940916763, 'timesteps': 40768, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:32:24] [AutoResearch] ========== Trial 3/20 ========== [2026-04-13 21:32:24] [AutoResearch] GP UCB top-5 candidates: [2026-04-13 21:32:24] UCB=2.2673 mu=0.5045 sigma=0.8814 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596} [2026-04-13 21:32:24] UCB=2.2663 mu=0.4912 sigma=0.8876 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0012733685738093425, 'timesteps': 41802} [2026-04-13 21:32:24] UCB=2.2632 mu=0.5326 sigma=0.8653 params={'n_steer': 3, 'n_throttle': 3, 'learning_rate': 0.0003737785062265609, 'timesteps': 48369} [2026-04-13 21:32:24] UCB=2.2622 mu=0.4884 sigma=0.8869 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0009593125016626112, 'timesteps': 41226} [2026-04-13 21:32:24] UCB=2.2594 mu=0.4092 sigma=0.9251 params={'n_steer': 3, 'n_throttle': 3, 'learning_rate': 0.0006680173697602083, 'timesteps': 33139} [2026-04-13 21:32:24] [AutoResearch] Proposed: {'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:32:26] [AutoResearch] Launching trial 3: {'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:57:48] [AutoResearch] Trial 3 finished in 1522.1s, returncode=0 [2026-04-13 21:57:48] [AutoResearch] Trial 3: mean_reward=615.6443 std_reward=2.4555 [2026-04-13 21:57:48] [AutoResearch] === Trial 3 Summary === [2026-04-13 21:57:48] Total Phase 1 runs: 4 [2026-04-13 21:57:48] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-13 21:57:48] Top 5: [2026-04-13 21:57:48] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:57:48] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:57:48] mean_reward=177.7416 params={'n_steer': 4, 'n_throttle': 2, 'learning_rate': 0.0016410214223984076, 'timesteps': 16101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:57:48] mean_reward=38.1267 params={'n_steer': 8, 'n_throttle': 2, 'learning_rate': 0.0012716386940916763, 'timesteps': 40768, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:57:50] [AutoResearch] ========== Trial 4/20 ========== [2026-04-13 21:57:50] [AutoResearch] GP UCB top-5 candidates: [2026-04-13 21:57:50] UCB=2.6247 mu=1.1138 sigma=0.7554 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0006446108743181142, 'timesteps': 25224} [2026-04-13 21:57:50] UCB=2.6201 mu=1.0985 sigma=0.7608 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.00040076107708415066, 'timesteps': 34530} [2026-04-13 21:57:50] UCB=2.6128 mu=0.9229 sigma=0.8449 params={'n_steer': 4, 'n_throttle': 5, 'learning_rate': 0.0007783797179569566, 'timesteps': 28443} [2026-04-13 21:57:50] UCB=2.5877 mu=1.1874 sigma=0.7001 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.000483120120780932, 'timesteps': 32645} [2026-04-13 21:57:50] UCB=2.5805 mu=1.1160 sigma=0.7322 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0004185314575094028, 'timesteps': 31606} [2026-04-13 21:57:50] [AutoResearch] Proposed: {'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0006446108743181142, 'timesteps': 25224, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 21:57:52] [AutoResearch] Launching trial 4: {'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0006446108743181142, 'timesteps': 25224, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:17:49] [AutoResearch] Trial 4 finished in 1196.7s, returncode=0 [2026-04-13 22:17:49] [AutoResearch] Trial 4: mean_reward=56.9474 std_reward=0.4525 [2026-04-13 22:17:49] [AutoResearch] === Trial 4 Summary === [2026-04-13 22:17:49] Total Phase 1 runs: 5 [2026-04-13 22:17:49] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-13 22:17:49] Top 5: [2026-04-13 22:17:49] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:17:49] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:17:49] mean_reward=177.7416 params={'n_steer': 4, 'n_throttle': 2, 'learning_rate': 0.0016410214223984076, 'timesteps': 16101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:17:49] mean_reward=56.9474 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0006446108743181142, 'timesteps': 25224, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:17:49] mean_reward=38.1267 params={'n_steer': 8, 'n_throttle': 2, 'learning_rate': 0.0012716386940916763, 'timesteps': 40768, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:17:51] [AutoResearch] ========== Trial 5/20 ========== [2026-04-13 22:17:51] [AutoResearch] GP UCB top-5 candidates: [2026-04-13 22:17:51] UCB=3.2705 mu=1.9137 sigma=0.6784 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0010511079430656864, 'timesteps': 43721} [2026-04-13 22:17:51] UCB=3.0915 mu=1.4459 sigma=0.8228 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0013020888853863901, 'timesteps': 44447} [2026-04-13 22:17:51] UCB=3.0371 mu=1.3845 sigma=0.8263 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0014826204762677822, 'timesteps': 36122} [2026-04-13 22:17:51] UCB=3.0172 mu=1.1871 sigma=0.9150 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0008990881268700197, 'timesteps': 47181} [2026-04-13 22:17:51] UCB=3.0156 mu=1.6493 sigma=0.6832 params={'n_steer': 4, 'n_throttle': 5, 'learning_rate': 0.0009711471794993783, 'timesteps': 36714} [2026-04-13 22:17:51] [AutoResearch] Proposed: {'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0010511079430656864, 'timesteps': 43721, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:17:53] [AutoResearch] Launching trial 5: {'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0010511079430656864, 'timesteps': 43721, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:46:54] [AutoResearch] Trial 5 finished in 1741.0s, returncode=0 [2026-04-13 22:46:54] [AutoResearch] Trial 5: mean_reward=254.5237 std_reward=34.6249 [2026-04-13 22:46:54] [AutoResearch] === Trial 5 Summary === [2026-04-13 22:46:54] Total Phase 1 runs: 6 [2026-04-13 22:46:54] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-13 22:46:54] Top 5: [2026-04-13 22:46:54] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:46:54] mean_reward=254.5237 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0010511079430656864, 'timesteps': 43721, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:46:54] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:46:54] mean_reward=177.7416 params={'n_steer': 4, 'n_throttle': 2, 'learning_rate': 0.0016410214223984076, 'timesteps': 16101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:46:54] mean_reward=56.9474 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0006446108743181142, 'timesteps': 25224, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:46:55] [AutoResearch] Git push complete after trial 5 [2026-04-13 22:46:57] [AutoResearch] ========== Trial 6/20 ========== [2026-04-13 22:46:57] [AutoResearch] GP UCB top-5 candidates: [2026-04-13 22:46:57] UCB=2.8976 mu=1.3885 sigma=0.7545 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0004916288196083273, 'timesteps': 45159} [2026-04-13 22:46:57] UCB=2.7044 mu=1.9380 sigma=0.3832 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0008675754116606385, 'timesteps': 37769} [2026-04-13 22:46:57] UCB=2.5483 mu=1.0014 sigma=0.7734 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0013296606512799647, 'timesteps': 32320} [2026-04-13 22:46:57] UCB=2.3786 mu=0.5746 sigma=0.9020 params={'n_steer': 3, 'n_throttle': 3, 'learning_rate': 0.0002809430632146512, 'timesteps': 38162} [2026-04-13 22:46:57] UCB=2.3450 mu=0.6872 sigma=0.8289 params={'n_steer': 4, 'n_throttle': 5, 'learning_rate': 0.000691104912585418, 'timesteps': 43009} [2026-04-13 22:46:57] [AutoResearch] Proposed: {'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0004916288196083273, 'timesteps': 45159, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 22:46:59] [AutoResearch] Launching trial 6: {'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0004916288196083273, 'timesteps': 45159, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:15:59] [AutoResearch] Trial 6 finished in 1740.1s, returncode=0 [2026-04-13 23:15:59] [AutoResearch] Trial 6: mean_reward=230.3458 std_reward=3.0194 [2026-04-13 23:15:59] [AutoResearch] === Trial 6 Summary === [2026-04-13 23:15:59] Total Phase 1 runs: 7 [2026-04-13 23:15:59] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-13 23:15:59] Top 5: [2026-04-13 23:15:59] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:15:59] mean_reward=254.5237 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0010511079430656864, 'timesteps': 43721, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:15:59] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:15:59] mean_reward=230.3458 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0004916288196083273, 'timesteps': 45159, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:15:59] mean_reward=177.7416 params={'n_steer': 4, 'n_throttle': 2, 'learning_rate': 0.0016410214223984076, 'timesteps': 16101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:16:01] [AutoResearch] ========== Trial 7/20 ========== [2026-04-13 23:16:01] [AutoResearch] GP UCB top-5 candidates: [2026-04-13 23:16:01] UCB=2.8151 mu=0.9820 sigma=0.9165 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.001574375789043505, 'timesteps': 34055} [2026-04-13 23:16:01] UCB=2.6240 mu=1.0426 sigma=0.7907 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.001137097715700357, 'timesteps': 30159} [2026-04-13 23:16:01] UCB=2.3629 mu=0.6057 sigma=0.8786 params={'n_steer': 3, 'n_throttle': 3, 'learning_rate': 0.0013640068427189318, 'timesteps': 36432} [2026-04-13 23:16:01] UCB=2.3445 mu=0.4178 sigma=0.9633 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0017303620708703264, 'timesteps': 25882} [2026-04-13 23:16:01] UCB=2.3407 mu=0.7245 sigma=0.8081 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.000811303710708658, 'timesteps': 34107} [2026-04-13 23:16:01] [AutoResearch] Proposed: {'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.001574375789043505, 'timesteps': 34055, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:16:03] [AutoResearch] Launching trial 7: {'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.001574375789043505, 'timesteps': 34055, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:44:53] [AutoResearch] Trial 7 finished in 1729.5s, returncode=0 [2026-04-13 23:44:53] [AutoResearch] Trial 7: mean_reward=69.0259 std_reward=10.9909 [2026-04-13 23:44:53] [AutoResearch] === Trial 7 Summary === [2026-04-13 23:44:53] Total Phase 1 runs: 8 [2026-04-13 23:44:53] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-13 23:44:53] Top 5: [2026-04-13 23:44:53] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:44:53] mean_reward=254.5237 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0010511079430656864, 'timesteps': 43721, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:44:53] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:44:53] mean_reward=230.3458 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0004916288196083273, 'timesteps': 45159, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:44:53] mean_reward=177.7416 params={'n_steer': 4, 'n_throttle': 2, 'learning_rate': 0.0016410214223984076, 'timesteps': 16101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:44:55] [AutoResearch] ========== Trial 8/20 ========== [2026-04-13 23:44:55] [AutoResearch] GP UCB top-5 candidates: [2026-04-13 23:44:55] UCB=2.6819 mu=1.1218 sigma=0.7800 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177} [2026-04-13 23:44:55] UCB=2.5982 mu=0.9843 sigma=0.8069 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.001106072643855368, 'timesteps': 28977} [2026-04-13 23:44:55] UCB=2.5885 mu=1.2137 sigma=0.6874 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009950286097591902, 'timesteps': 31387} [2026-04-13 23:44:55] UCB=2.5042 mu=0.6672 sigma=0.9185 params={'n_steer': 3, 'n_throttle': 3, 'learning_rate': 0.0015134650324881223, 'timesteps': 31708} [2026-04-13 23:44:55] UCB=2.4989 mu=0.7372 sigma=0.8808 params={'n_steer': 3, 'n_throttle': 3, 'learning_rate': 0.0011556578968693356, 'timesteps': 30629} [2026-04-13 23:44:55] [AutoResearch] Proposed: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 23:44:57] [AutoResearch] Launching trial 8: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:08:12] [AutoResearch] Trial 8 finished in 1395.2s, returncode=0 [2026-04-14 00:08:12] [AutoResearch] Trial 8: mean_reward=2296.1891 std_reward=14.0346 [2026-04-14 00:08:12] [AutoResearch] === Trial 8 Summary === [2026-04-14 00:08:12] Total Phase 1 runs: 9 [2026-04-14 00:08:12] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 00:08:12] Top 5: [2026-04-14 00:08:12] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:08:12] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:08:12] mean_reward=254.5237 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0010511079430656864, 'timesteps': 43721, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:08:12] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:08:12] mean_reward=230.3458 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0004916288196083273, 'timesteps': 45159, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:08:14] [AutoResearch] ========== Trial 9/20 ========== [2026-04-14 00:08:14] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 00:08:14] UCB=3.4249 mu=2.1949 sigma=0.6150 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0010038571924825726, 'timesteps': 29380} [2026-04-14 00:08:14] UCB=3.4098 mu=1.9348 sigma=0.7375 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0016876197465002791, 'timesteps': 29946} [2026-04-14 00:08:14] UCB=3.3195 mu=1.8981 sigma=0.7107 params={'n_steer': 3, 'n_throttle': 3, 'learning_rate': 0.0015133213029551393, 'timesteps': 29545} [2026-04-14 00:08:14] UCB=3.3002 mu=1.7970 sigma=0.7516 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0009092257531208215, 'timesteps': 26759} [2026-04-14 00:08:14] UCB=3.2755 mu=1.8436 sigma=0.7159 params={'n_steer': 4, 'n_throttle': 2, 'learning_rate': 0.001032563222262004, 'timesteps': 29035} [2026-04-14 00:08:14] [AutoResearch] Proposed: {'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0010038571924825726, 'timesteps': 29380, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:08:16] [AutoResearch] Launching trial 9: {'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0010038571924825726, 'timesteps': 29380, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:32:30] [AutoResearch] Trial 9 finished in 1454.2s, returncode=0 [2026-04-14 00:32:30] [AutoResearch] Trial 9: mean_reward=62.5084 std_reward=9.1358 [2026-04-14 00:32:30] [AutoResearch] === Trial 9 Summary === [2026-04-14 00:32:30] Total Phase 1 runs: 10 [2026-04-14 00:32:30] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 00:32:30] Top 5: [2026-04-14 00:32:30] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:32:30] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:32:30] mean_reward=254.5237 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0010511079430656864, 'timesteps': 43721, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:32:30] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:32:30] mean_reward=230.3458 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0004916288196083273, 'timesteps': 45159, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:32:32] [AutoResearch] ========== Trial 10/20 ========== [2026-04-14 00:32:32] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 00:32:32] UCB=4.3821 mu=3.5544 sigma=0.4138 params={'n_steer': 3, 'n_throttle': 3, 'learning_rate': 0.0011311496831886009, 'timesteps': 35197} [2026-04-14 00:32:32] UCB=3.9901 mu=2.2172 sigma=0.8864 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0019647588641987608, 'timesteps': 35583} [2026-04-14 00:32:32] UCB=3.9576 mu=3.1158 sigma=0.4209 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0013877163147355273, 'timesteps': 34726} [2026-04-14 00:32:32] UCB=3.9250 mu=2.5046 sigma=0.7102 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0016362435570891763, 'timesteps': 38090} [2026-04-14 00:32:32] UCB=3.8792 mu=2.3746 sigma=0.7523 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.001709600322137922, 'timesteps': 32114} [2026-04-14 00:32:32] [AutoResearch] Proposed: {'n_steer': 3, 'n_throttle': 3, 'learning_rate': 0.0011311496831886009, 'timesteps': 35197, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:32:34] [AutoResearch] Launching trial 10: {'n_steer': 3, 'n_throttle': 3, 'learning_rate': 0.0011311496831886009, 'timesteps': 35197, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:56:14] [AutoResearch] Trial 10 finished in 1420.1s, returncode=0 [2026-04-14 00:56:14] [AutoResearch] Trial 10: mean_reward=144.7129 std_reward=26.0347 [2026-04-14 00:56:14] [AutoResearch] === Trial 10 Summary === [2026-04-14 00:56:14] Total Phase 1 runs: 11 [2026-04-14 00:56:14] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 00:56:14] Top 5: [2026-04-14 00:56:14] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:56:14] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:56:14] mean_reward=254.5237 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0010511079430656864, 'timesteps': 43721, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:56:14] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:56:14] mean_reward=230.3458 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0004916288196083273, 'timesteps': 45159, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:56:16] [AutoResearch] Git push complete after trial 10 [2026-04-14 00:56:18] [AutoResearch] ========== Trial 11/20 ========== [2026-04-14 00:56:18] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 00:56:18] UCB=5.7586 mu=4.7912 sigma=0.4837 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0014246268134911666, 'timesteps': 38210} [2026-04-14 00:56:18] UCB=5.4431 mu=3.8017 sigma=0.8207 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0017002774887490608, 'timesteps': 35955} [2026-04-14 00:56:18] UCB=5.4032 mu=3.7564 sigma=0.8234 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0018388176704805855, 'timesteps': 33318} [2026-04-14 00:56:18] UCB=5.2470 mu=4.1580 sigma=0.5445 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0013978686256657003, 'timesteps': 36838} [2026-04-14 00:56:18] UCB=5.0468 mu=4.1879 sigma=0.4294 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0012681926634959078, 'timesteps': 39254} [2026-04-14 00:56:18] [AutoResearch] Proposed: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0014246268134911666, 'timesteps': 38210, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 00:56:20] [AutoResearch] Launching trial 11: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0014246268134911666, 'timesteps': 38210, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:26:38] [AutoResearch] Trial 11 finished in 1818.2s, returncode=0 [2026-04-14 01:26:38] [AutoResearch] Trial 11: mean_reward=114.5364 std_reward=4.0149 [2026-04-14 01:26:38] [AutoResearch] === Trial 11 Summary === [2026-04-14 01:26:38] Total Phase 1 runs: 12 [2026-04-14 01:26:38] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 01:26:38] Top 5: [2026-04-14 01:26:38] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:26:38] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:26:38] mean_reward=254.5237 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0010511079430656864, 'timesteps': 43721, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:26:38] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:26:38] mean_reward=230.3458 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.0004916288196083273, 'timesteps': 45159, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:26:40] [AutoResearch] ========== Trial 12/20 ========== [2026-04-14 01:26:40] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 01:26:40] UCB=3.8375 mu=3.2333 sigma=0.3021 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234} [2026-04-14 01:26:40] UCB=3.5203 mu=2.1732 sigma=0.6735 params={'n_steer': 4, 'n_throttle': 2, 'learning_rate': 0.0008846780992589506, 'timesteps': 32580} [2026-04-14 01:26:40] UCB=3.4985 mu=2.4208 sigma=0.5388 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0006733740843935935, 'timesteps': 39503} [2026-04-14 01:26:40] UCB=3.3453 mu=2.7481 sigma=0.2986 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0010790723670313863, 'timesteps': 38707} [2026-04-14 01:26:40] UCB=3.2998 mu=2.1993 sigma=0.5503 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0013192003620922743, 'timesteps': 27318} [2026-04-14 01:26:40] [AutoResearch] Proposed: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:26:42] [AutoResearch] Launching trial 12: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:51:56] [AutoResearch] Trial 12 finished in 1514.2s, returncode=0 [2026-04-14 01:51:56] [AutoResearch] Trial 12: mean_reward=1382.4461 std_reward=8.1109 [2026-04-14 01:51:56] [AutoResearch] === Trial 12 Summary === [2026-04-14 01:51:56] Total Phase 1 runs: 13 [2026-04-14 01:51:56] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 01:51:56] Top 5: [2026-04-14 01:51:56] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:51:56] mean_reward=1382.4461 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:51:56] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:51:56] mean_reward=254.5237 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0010511079430656864, 'timesteps': 43721, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:51:56] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:51:58] [AutoResearch] ========== Trial 13/20 ========== [2026-04-14 01:51:58] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 01:51:58] UCB=5.1239 mu=3.3605 sigma=0.8817 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0016222563549258791, 'timesteps': 22612} [2026-04-14 01:51:58] UCB=4.8639 mu=3.0971 sigma=0.8834 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0015280548232000533, 'timesteps': 21561} [2026-04-14 01:51:58] UCB=4.7060 mu=3.0790 sigma=0.8135 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0013615010429186987, 'timesteps': 21438} [2026-04-14 01:51:58] UCB=4.3756 mu=2.6326 sigma=0.8715 params={'n_steer': 5, 'n_throttle': 4, 'learning_rate': 0.0015217125608335401, 'timesteps': 25040} [2026-04-14 01:51:58] UCB=4.2939 mu=3.0496 sigma=0.6221 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.001654421510937643, 'timesteps': 25309} [2026-04-14 01:51:58] [AutoResearch] Proposed: {'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0016222563549258791, 'timesteps': 22612, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 01:52:00] [AutoResearch] Launching trial 13: {'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0016222563549258791, 'timesteps': 22612, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:11:39] [AutoResearch] Trial 13 finished in 1178.6s, returncode=0 [2026-04-14 02:11:39] [AutoResearch] Trial 13: mean_reward=554.1497 std_reward=0.6798 [2026-04-14 02:11:39] [AutoResearch] === Trial 13 Summary === [2026-04-14 02:11:39] Total Phase 1 runs: 14 [2026-04-14 02:11:39] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 02:11:39] Top 5: [2026-04-14 02:11:39] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:11:39] mean_reward=1382.4461 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:11:39] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:11:39] mean_reward=554.1497 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0016222563549258791, 'timesteps': 22612, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:11:39] mean_reward=254.5237 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0010511079430656864, 'timesteps': 43721, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:11:41] [AutoResearch] ========== Trial 14/20 ========== [2026-04-14 02:11:41] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 02:11:41] UCB=3.8565 mu=3.3618 sigma=0.2474 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.001421177467065464, 'timesteps': 33363} [2026-04-14 02:11:41] UCB=3.8260 mu=2.9123 sigma=0.4569 params={'n_steer': 5, 'n_throttle': 4, 'learning_rate': 0.0013832798966787621, 'timesteps': 31597} [2026-04-14 02:11:41] UCB=3.7375 mu=2.3587 sigma=0.6894 params={'n_steer': 5, 'n_throttle': 2, 'learning_rate': 0.0015349955513377042, 'timesteps': 30143} [2026-04-14 02:11:41] UCB=3.5605 mu=2.1382 sigma=0.7112 params={'n_steer': 5, 'n_throttle': 2, 'learning_rate': 0.0016075091237935828, 'timesteps': 28899} [2026-04-14 02:11:41] UCB=3.4807 mu=2.0449 sigma=0.7179 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0016759018515103408, 'timesteps': 24723} [2026-04-14 02:11:41] [AutoResearch] Proposed: {'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.001421177467065464, 'timesteps': 33363, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:11:43] [AutoResearch] Launching trial 14: {'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.001421177467065464, 'timesteps': 33363, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:37:37] [AutoResearch] Trial 14 finished in 1554.4s, returncode=0 [2026-04-14 02:37:37] [AutoResearch] Trial 14: mean_reward=1097.1248 std_reward=7.4952 [2026-04-14 02:37:37] [AutoResearch] === Trial 14 Summary === [2026-04-14 02:37:37] Total Phase 1 runs: 15 [2026-04-14 02:37:37] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 02:37:37] Top 5: [2026-04-14 02:37:37] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:37:37] mean_reward=1382.4461 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:37:37] mean_reward=1097.1248 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.001421177467065464, 'timesteps': 33363, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:37:37] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:37:37] mean_reward=554.1497 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0016222563549258791, 'timesteps': 22612, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:37:39] [AutoResearch] ========== Trial 15/20 ========== [2026-04-14 02:37:39] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 02:37:39] UCB=3.2403 mu=2.2411 sigma=0.4996 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0012917341170092288, 'timesteps': 26533} [2026-04-14 02:37:39] UCB=3.1868 mu=2.3620 sigma=0.4124 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0012706749484726841, 'timesteps': 27506} [2026-04-14 02:37:39] UCB=2.9618 mu=1.6583 sigma=0.6517 params={'n_steer': 3, 'n_throttle': 3, 'learning_rate': 0.0013973166077409632, 'timesteps': 22341} [2026-04-14 02:37:39] UCB=2.9498 mu=2.1386 sigma=0.4056 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0013031109426836762, 'timesteps': 32608} [2026-04-14 02:37:39] UCB=2.8763 mu=2.3716 sigma=0.2524 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011925266986504122, 'timesteps': 31551} [2026-04-14 02:37:39] [AutoResearch] Proposed: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0012917341170092288, 'timesteps': 26533, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:37:41] [AutoResearch] Launching trial 15: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0012917341170092288, 'timesteps': 26533, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:56:38] [AutoResearch] Trial 15 finished in 1136.8s, returncode=0 [2026-04-14 02:56:38] [AutoResearch] Trial 15: mean_reward=109.7097 std_reward=1.6652 [2026-04-14 02:56:38] [AutoResearch] === Trial 15 Summary === [2026-04-14 02:56:38] Total Phase 1 runs: 16 [2026-04-14 02:56:38] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 02:56:38] Top 5: [2026-04-14 02:56:38] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:56:38] mean_reward=1382.4461 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:56:38] mean_reward=1097.1248 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.001421177467065464, 'timesteps': 33363, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:56:38] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:56:38] mean_reward=554.1497 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0016222563549258791, 'timesteps': 22612, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:56:39] [AutoResearch] Git push complete after trial 15 [2026-04-14 02:56:41] [AutoResearch] ========== Trial 16/20 ========== [2026-04-14 02:56:41] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 02:56:41] UCB=3.9383 mu=2.6946 sigma=0.6219 params={'n_steer': 5, 'n_throttle': 2, 'learning_rate': 0.0011227360194223832, 'timesteps': 37093} [2026-04-14 02:56:41] UCB=3.2192 mu=2.0021 sigma=0.6086 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0007358066017054203, 'timesteps': 24297} [2026-04-14 02:56:41] UCB=3.0999 mu=1.9951 sigma=0.5524 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0009505146272118057, 'timesteps': 39697} [2026-04-14 02:56:41] UCB=3.0210 mu=1.1783 sigma=0.9214 params={'n_steer': 4, 'n_throttle': 5, 'learning_rate': 0.00027137746549538573, 'timesteps': 16162} [2026-04-14 02:56:41] UCB=2.8375 mu=1.5535 sigma=0.6420 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0008195673228116497, 'timesteps': 41274} [2026-04-14 02:56:41] [AutoResearch] Proposed: {'n_steer': 5, 'n_throttle': 2, 'learning_rate': 0.0011227360194223832, 'timesteps': 37093, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 02:56:44] [AutoResearch] Launching trial 16: {'n_steer': 5, 'n_throttle': 2, 'learning_rate': 0.0011227360194223832, 'timesteps': 37093, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:27:11] [AutoResearch] Trial 16 finished in 1827.7s, returncode=0 [2026-04-14 03:27:11] [AutoResearch] Trial 16: mean_reward=39.12 std_reward=0.7297 [2026-04-14 03:27:11] [AutoResearch] === Trial 16 Summary === [2026-04-14 03:27:11] Total Phase 1 runs: 17 [2026-04-14 03:27:11] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 03:27:11] Top 5: [2026-04-14 03:27:11] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:27:11] mean_reward=1382.4461 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:27:11] mean_reward=1097.1248 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.001421177467065464, 'timesteps': 33363, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:27:11] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:27:11] mean_reward=554.1497 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0016222563549258791, 'timesteps': 22612, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:27:13] [AutoResearch] ========== Trial 17/20 ========== [2026-04-14 03:27:13] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 03:27:13] UCB=3.5336 mu=2.8374 sigma=0.3481 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0009690818044583388, 'timesteps': 36863} [2026-04-14 03:27:13] UCB=3.3778 mu=2.4691 sigma=0.4544 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0011060386861867358, 'timesteps': 38344} [2026-04-14 03:27:13] UCB=3.1627 mu=2.3423 sigma=0.4102 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0009600413926159611, 'timesteps': 41941} [2026-04-14 03:27:13] UCB=3.1314 mu=1.5375 sigma=0.7970 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.00021015763680829732, 'timesteps': 29106} [2026-04-14 03:27:13] UCB=3.0196 mu=1.4338 sigma=0.7929 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.00022824908978925457, 'timesteps': 30026} [2026-04-14 03:27:13] [AutoResearch] Proposed: {'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0009690818044583388, 'timesteps': 36863, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:27:15] [AutoResearch] Launching trial 17: {'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0009690818044583388, 'timesteps': 36863, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:59:44] [AutoResearch] Trial 17 finished in 1949.2s, returncode=0 [2026-04-14 03:59:44] [AutoResearch] Trial 17: mean_reward=176.0936 std_reward=10.7529 [2026-04-14 03:59:44] [AutoResearch] === Trial 17 Summary === [2026-04-14 03:59:44] Total Phase 1 runs: 18 [2026-04-14 03:59:44] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 03:59:44] Top 5: [2026-04-14 03:59:44] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:59:44] mean_reward=1382.4461 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:59:44] mean_reward=1097.1248 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.001421177467065464, 'timesteps': 33363, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:59:44] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:59:44] mean_reward=554.1497 params={'n_steer': 4, 'n_throttle': 4, 'learning_rate': 0.0016222563549258791, 'timesteps': 22612, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:59:46] [AutoResearch] ========== Trial 18/20 ========== [2026-04-14 03:59:46] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 03:59:46] UCB=3.2224 mu=1.4545 sigma=0.8839 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0002881292103575585, 'timesteps': 15876} [2026-04-14 03:59:46] UCB=3.2135 mu=2.2932 sigma=0.4602 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.0010258036258562022, 'timesteps': 40185} [2026-04-14 03:59:46] UCB=2.8605 mu=2.4589 sigma=0.2008 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009986646332647185, 'timesteps': 40641} [2026-04-14 03:59:46] UCB=2.8267 mu=0.9938 sigma=0.9164 params={'n_steer': 4, 'n_throttle': 5, 'learning_rate': 0.0007054754747880616, 'timesteps': 12184} [2026-04-14 03:59:46] UCB=2.8001 mu=1.2702 sigma=0.7650 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0016048842351304125, 'timesteps': 16378} [2026-04-14 03:59:46] [AutoResearch] Proposed: {'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0002881292103575585, 'timesteps': 15876, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 03:59:48] [AutoResearch] Launching trial 18: {'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0002881292103575585, 'timesteps': 15876, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:12:20] [AutoResearch] Trial 18 finished in 751.7s, returncode=0 [2026-04-14 04:12:20] [AutoResearch] Trial 18: mean_reward=2073.7372 std_reward=1.3899 [2026-04-14 04:12:20] [AutoResearch] === Trial 18 Summary === [2026-04-14 04:12:20] Total Phase 1 runs: 19 [2026-04-14 04:12:20] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 04:12:20] Top 5: [2026-04-14 04:12:20] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:12:20] mean_reward=2073.7372 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0002881292103575585, 'timesteps': 15876, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:12:20] mean_reward=1382.4461 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:12:20] mean_reward=1097.1248 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.001421177467065464, 'timesteps': 33363, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:12:20] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:12:22] [AutoResearch] ========== Trial 19/20 ========== [2026-04-14 04:12:22] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 04:12:22] UCB=3.1791 mu=1.9916 sigma=0.5937 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0007311720966729557, 'timesteps': 16351} [2026-04-14 04:12:22] UCB=2.8897 mu=2.3059 sigma=0.2919 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.001142626249677311, 'timesteps': 39501} [2026-04-14 04:12:22] UCB=2.8240 mu=1.3340 sigma=0.7450 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 9.635993722889297e-05, 'timesteps': 26883} [2026-04-14 04:12:22] UCB=2.8024 mu=1.1331 sigma=0.8346 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0011656394037404603, 'timesteps': 14676} [2026-04-14 04:12:22] UCB=2.7897 mu=1.9228 sigma=0.4335 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 5.105016526526456e-05, 'timesteps': 14097} [2026-04-14 04:12:22] [AutoResearch] Proposed: {'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0007311720966729557, 'timesteps': 16351, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:12:24] [AutoResearch] Launching trial 19: {'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0007311720966729557, 'timesteps': 16351, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:23:50] [AutoResearch] Trial 19 finished in 685.5s, returncode=0 [2026-04-14 04:23:50] [AutoResearch] Trial 19: mean_reward=261.0141 std_reward=43.9044 [2026-04-14 04:23:50] [AutoResearch] === Trial 19 Summary === [2026-04-14 04:23:50] Total Phase 1 runs: 20 [2026-04-14 04:23:50] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 04:23:50] Top 5: [2026-04-14 04:23:50] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:23:50] mean_reward=2073.7372 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0002881292103575585, 'timesteps': 15876, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:23:50] mean_reward=1382.4461 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:23:50] mean_reward=1097.1248 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.001421177467065464, 'timesteps': 33363, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:23:50] mean_reward=615.6443 params={'n_steer': 3, 'n_throttle': 4, 'learning_rate': 0.000840799681375933, 'timesteps': 35596, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:23:52] [AutoResearch] ========== Trial 20/20 ========== [2026-04-14 04:23:52] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 04:23:52] UCB=3.2972 mu=2.4863 sigma=0.4054 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.00022474333387549633, 'timesteps': 13328} [2026-04-14 04:23:52] UCB=2.6260 mu=1.5672 sigma=0.5294 params={'n_steer': 4, 'n_throttle': 5, 'learning_rate': 0.0003413271441769394, 'timesteps': 15873} [2026-04-14 04:23:52] UCB=2.3718 mu=0.6154 sigma=0.8782 params={'n_steer': 4, 'n_throttle': 5, 'learning_rate': 5.171688067013589e-05, 'timesteps': 28148} [2026-04-14 04:23:52] UCB=2.2392 mu=0.4877 sigma=0.8758 params={'n_steer': 5, 'n_throttle': 4, 'learning_rate': 0.001889491481388905, 'timesteps': 10354} [2026-04-14 04:23:52] UCB=2.2106 mu=0.3061 sigma=0.9522 params={'n_steer': 6, 'n_throttle': 5, 'learning_rate': 0.0018766239559755721, 'timesteps': 15866} [2026-04-14 04:23:52] [AutoResearch] Proposed: {'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.00022474333387549633, 'timesteps': 13328, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:23:54] [AutoResearch] Launching trial 20: {'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.00022474333387549633, 'timesteps': 13328, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:35:45] [AutoResearch] Trial 20 finished in 711.5s, returncode=0 [2026-04-14 04:35:45] [AutoResearch] Trial 20: mean_reward=2469.2835 std_reward=1.1918 [2026-04-14 04:35:45] [AutoResearch] === Trial 20 Summary === [2026-04-14 04:35:45] Total Phase 1 runs: 21 [2026-04-14 04:35:45] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 04:35:45] Top 5: [2026-04-14 04:35:45] mean_reward=2469.2835 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.00022474333387549633, 'timesteps': 13328, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:35:45] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:35:45] mean_reward=2073.7372 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0002881292103575585, 'timesteps': 15876, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:35:45] mean_reward=1382.4461 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:35:45] mean_reward=1097.1248 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.001421177467065464, 'timesteps': 33363, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:35:47] [AutoResearch] Git push complete after trial 20 [2026-04-14 04:35:49] [AutoResearch] All trials complete! [2026-04-14 04:35:49] [AutoResearch] === Trial 20 Summary === [2026-04-14 04:35:49] Total Phase 1 runs: 21 [2026-04-14 04:35:49] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-14 04:35:49] Top 5: [2026-04-14 04:35:49] mean_reward=2469.2835 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.00022474333387549633, 'timesteps': 13328, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:35:49] mean_reward=2296.1891 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0011680072988353367, 'timesteps': 34177, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:35:49] mean_reward=2073.7372 params={'n_steer': 3, 'n_throttle': 5, 'learning_rate': 0.0002881292103575585, 'timesteps': 15876, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:35:49] mean_reward=1382.4461 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0010723485700433605, 'timesteps': 33234, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:35:49] mean_reward=1097.1248 params={'n_steer': 5, 'n_throttle': 3, 'learning_rate': 0.001421177467065464, 'timesteps': 33363, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-14 04:35:50] [AutoResearch] Git push complete after trial 20 [2026-04-14 09:28:23] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 09:28:23] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-14 09:28:23] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-14 09:28:23] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-14 09:28:23] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-14 09:28:23] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-14 09:28:23] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-14 09:28:23] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-14 09:28:23] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-14 09:28:23] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-14 09:28:23] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-14 09:28:23] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-14 09:28:23] [AutoResearch] Only 1 results — using random proposal. [2026-04-14 12:45:34] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 12:45:34] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-14 12:45:34] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-14 12:45:34] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-14 12:45:34] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-14 12:45:34] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-14 12:45:34] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-14 12:45:34] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-14 12:45:34] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-14 12:45:34] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-14 12:45:34] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-14 12:45:34] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-14 12:45:34] [AutoResearch] Only 1 results — using random proposal. [2026-04-14 13:28:43] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 13:28:43] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-14 13:28:43] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-14 13:28:43] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-14 13:28:43] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-14 13:28:43] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-14 13:28:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-14 13:28:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-14 13:28:43] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-14 13:28:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-14 13:28:43] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-14 13:28:43] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-14 13:28:43] [AutoResearch] Only 1 results — using random proposal. [2026-04-14 13:29:04] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 13:29:04] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-14 13:29:04] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-14 13:29:04] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-14 13:29:04] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-14 13:29:04] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-14 13:29:04] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-14 13:29:04] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-14 13:29:04] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-14 13:29:04] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-14 13:29:04] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-14 13:29:04] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-14 13:29:04] [AutoResearch] Only 1 results — using random proposal. [2026-04-14 13:29:30] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 13:29:30] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-14 13:29:30] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-14 13:29:30] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-14 13:29:30] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-14 13:29:30] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-14 13:29:30] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-14 13:29:30] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-14 13:29:30] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-14 13:29:30] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-14 13:29:30] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-14 13:29:30] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-14 13:29:30] [AutoResearch] Only 1 results — using random proposal. [2026-04-14 13:47:13] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 13:47:13] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-14 13:47:13] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-14 13:47:13] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-14 13:47:13] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-14 13:47:13] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-14 13:47:13] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-14 13:47:13] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-14 13:47:13] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-14 13:47:13] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-14 13:47:13] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-14 13:47:13] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-14 13:47:13] [AutoResearch] Only 1 results — using random proposal. [2026-04-14 20:37:35] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 20:37:35] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-14 20:37:35] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-14 20:37:35] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-14 20:37:35] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-14 20:37:35] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-14 20:37:35] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-14 20:37:35] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-14 20:37:35] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-14 20:37:35] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-14 20:37:35] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-14 20:37:35] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-14 20:37:35] [AutoResearch] Only 1 results — using random proposal. [2026-04-14 21:27:08] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 21:27:08] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-14 21:27:08] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-14 21:27:08] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-14 21:27:08] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-14 21:27:08] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-14 21:27:08] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-14 21:27:08] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-14 21:27:08] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-14 21:27:08] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-14 21:27:08] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-14 21:27:08] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-14 21:27:08] [AutoResearch] Only 1 results — using random proposal. [2026-04-14 22:40:11] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 22:40:11] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-14 22:40:11] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-14 22:40:11] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-14 22:40:11] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-14 22:40:11] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-14 22:40:11] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-14 22:40:11] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-14 22:40:11] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-14 22:40:11] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-14 22:40:11] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-14 22:40:11] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-14 22:40:11] [AutoResearch] Only 1 results — using random proposal. [2026-04-14 22:43:59] [AutoResearch] GP UCB top-5 candidates: [2026-04-14 22:43:59] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-14 22:43:59] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-14 22:43:59] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-14 22:43:59] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-14 22:43:59] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-14 22:43:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-14 22:43:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-14 22:43:59] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-14 22:43:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-14 22:43:59] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-14 22:43:59] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-14 22:43:59] [AutoResearch] Only 1 results — using random proposal. [2026-04-15 09:03:29] [AutoResearch] GP UCB top-5 candidates: [2026-04-15 09:03:29] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-15 09:03:29] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-15 09:03:29] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-15 09:03:29] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-15 09:03:29] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-15 09:03:29] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-15 09:03:29] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-15 09:03:29] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-15 09:03:29] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-15 09:03:29] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-15 09:03:29] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-15 09:03:29] [AutoResearch] Only 1 results — using random proposal. [2026-04-15 09:04:15] [AutoResearch] GP UCB top-5 candidates: [2026-04-15 09:04:15] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-15 09:04:15] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-15 09:04:15] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-15 09:04:15] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-15 09:04:15] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-15 09:04:15] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-15 09:04:15] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-15 09:04:15] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-15 09:04:15] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-15 09:04:15] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-15 09:04:15] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-15 09:04:15] [AutoResearch] Only 1 results — using random proposal. [2026-04-15 09:05:43] [AutoResearch] GP UCB top-5 candidates: [2026-04-15 09:05:43] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-15 09:05:43] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-15 09:05:43] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-15 09:05:43] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-15 09:05:43] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-15 09:05:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-15 09:05:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-15 09:05:43] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-15 09:05:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-15 09:05:43] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-15 09:05:43] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-15 09:05:43] [AutoResearch] Only 1 results — using random proposal. [2026-04-15 09:14:59] [AutoResearch] GP UCB top-5 candidates: [2026-04-15 09:14:59] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-15 09:14:59] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-15 09:14:59] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-15 09:14:59] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-15 09:14:59] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-15 09:14:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-15 09:14:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-15 09:14:59] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-15 09:14:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-15 09:14:59] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-15 09:14:59] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-15 09:14:59] [AutoResearch] Only 1 results — using random proposal. [2026-04-15 09:16:53] [AutoResearch] GP UCB top-5 candidates: [2026-04-15 09:16:53] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-15 09:16:53] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-15 09:16:53] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-15 09:16:53] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-15 09:16:53] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-15 09:16:53] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-15 09:16:53] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-15 09:16:53] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-15 09:16:53] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-15 09:16:53] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-15 09:16:53] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-15 09:16:53] [AutoResearch] Only 1 results — using random proposal. [2026-04-15 21:54:16] [AutoResearch] GP UCB top-5 candidates: [2026-04-15 21:54:16] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-15 21:54:16] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-15 21:54:16] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-15 21:54:16] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-15 21:54:16] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-15 21:54:16] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-15 21:54:16] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-15 21:54:16] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-15 21:54:16] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-15 21:54:16] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-15 21:54:16] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-15 21:54:16] [AutoResearch] Only 1 results — using random proposal. [2026-04-15 22:26:26] [AutoResearch] GP UCB top-5 candidates: [2026-04-15 22:26:26] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-15 22:26:26] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-15 22:26:26] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-15 22:26:26] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-15 22:26:26] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-15 22:26:26] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-15 22:26:26] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-15 22:26:26] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-15 22:26:26] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-15 22:26:26] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-15 22:26:26] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-15 22:26:26] [AutoResearch] Only 1 results — using random proposal. [2026-04-15 22:47:03] [AutoResearch] GP UCB top-5 candidates: [2026-04-15 22:47:03] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-15 22:47:03] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-15 22:47:03] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-15 22:47:03] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-15 22:47:03] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-15 22:47:03] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-15 22:47:03] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-15 22:47:03] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-15 22:47:03] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-15 22:47:03] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-15 22:47:03] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-15 22:47:03] [AutoResearch] Only 1 results — using random proposal. [2026-04-16 17:28:47] [AutoResearch] GP UCB top-5 candidates: [2026-04-16 17:28:47] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-16 17:28:47] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-16 17:28:47] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-16 17:28:47] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-16 17:28:47] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-16 17:28:47] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-16 17:28:47] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-16 17:28:47] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-16 17:28:47] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-16 17:28:47] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-16 17:28:47] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-16 17:28:47] [AutoResearch] Only 1 results — using random proposal.