[2026-04-13 19:33:13] ============================================================ [2026-04-13 19:33:13] [AutoResearch] Phase 1 — Real PPO Training + GP+UCB Optimization [2026-04-13 19:33:13] [AutoResearch] Max trials: 20 | kappa: 2.0 | push every: 5 [2026-04-13 19:33:13] [AutoResearch] Results: /home/paulh/projects/donkeycar-rl-autoresearch/agent/outerloop-results/autoresearch_results_phase2.jsonl [2026-04-13 19:33:13] [AutoResearch] Champion: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion [2026-04-13 19:33:13] ============================================================ [2026-04-13 19:33:13] [AutoResearch] Loaded 0 existing Phase 1 results. [2026-04-13 19:33:13] [AutoResearch] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-13 19:33:13] [AutoResearch] ========== Trial 1/20 ========== [2026-04-13 19:33:13] [AutoResearch] Only 0 results — using random proposal. [2026-04-13 19:33:13] [AutoResearch] Proposed: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 19:33:15] [AutoResearch] Launching trial 1: {'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 20:05:03] [AutoResearch] Trial 1 finished in 1908.3s, returncode=0 [2026-04-13 20:05:03] [AutoResearch] Trial 1: mean_reward=234.5386 std_reward=3.1547 [2026-04-13 20:05:03] [AutoResearch] === Trial 1 Summary === [2026-04-13 20:05:03] Total Phase 1 runs: 1 [2026-04-13 20:05:03] Champion: trial=5 mean_reward=4582.7984 params={'n_steer': 7, 'n_throttle': 3, 'learning_rate': 0.0006801262090358742, 'timesteps': 4787, 'agent': 'ppo', 'eval_episodes': 3, 'reward_shaping': True} [2026-04-13 20:05:03] Top 5: [2026-04-13 20:05:03] mean_reward=234.5386 params={'n_steer': 4, 'n_throttle': 3, 'learning_rate': 0.0009737963906394612, 'timesteps': 47325, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 20:05:05] [AutoResearch] ========== Trial 2/20 ========== [2026-04-13 20:05:05] [AutoResearch] Only 1 results — using random proposal. [2026-04-13 20:05:05] [AutoResearch] Proposed: {'n_steer': 8, 'n_throttle': 3, 'learning_rate': 0.0012285179829782996, 'timesteps': 39101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 20:05:07] [AutoResearch] Launching trial 2: {'n_steer': 8, 'n_throttle': 3, 'learning_rate': 0.0012285179829782996, 'timesteps': 39101, 'agent': 'ppo', 'eval_episodes': 5, 'reward_shaping': True} [2026-04-13 20:55:43] [AutoResearch] GP UCB top-5 candidates: [2026-04-13 20:55:43] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-13 20:55:43] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-13 20:55:43] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-13 20:55:43] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-13 20:55:43] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-13 20:55:43] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-13 20:55:43] [AutoResearch] Only 1 results — using random proposal. [2026-04-13 20:55:59] [AutoResearch] GP UCB top-5 candidates: [2026-04-13 20:55:59] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173} [2026-04-13 20:55:59] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198} [2026-04-13 20:55:59] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887} [2026-04-13 20:55:59] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199} [2026-04-13 20:55:59] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035} [2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5} [2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7} [2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50} [2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80} [2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-13 20:55:59] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-13 20:55:59] [AutoResearch] Only 1 results — using random proposal.