d25bc71008autoresearch: phase1 trial 10 results
Paul Huliganga
2026-04-13 13:11:06 -0400
5e93dae316fix: hack-proof reward shaping + reward hacking detection + research log
Paul Huliganga
2026-04-13 12:27:48 -0400
0c6263352bautoresearch: phase1 trial 10 results
Paul Huliganga
2026-04-13 12:01:17 -0400
8c9fd76c68fix: reduce timesteps to 1k-5k for Phase 1 CPU training; add sim health/stuck detection; fix PPO throttle clamp
Paul Huliganga
2026-04-13 11:17:08 -0400
c804189dd0feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing
Paul Huliganga
2026-04-13 10:03:15 -0400
083326a497AUTORESEARCH: 300 total trials complete - best mean_reward=141.85 at n_steer=8, n_throttle=5, lr=0.00202
Paul Huliganga
2026-04-13 01:56:06 -0400
3446e5f7c1AUTORESEARCH: 100 trials complete - best mean_reward=114.56 at n_steer=8, n_throttle=4, lr=0.00208
Paul Huliganga
2026-04-13 01:13:20 -0400
bb9e6d9105AUTORESEARCH: Full Karpathy-style GP+UCB meta-controller, clean base data, fixed all paths, ready to run
Paul Huliganga
2026-04-13 00:52:00 -0400
4a4e61d463CLEAN: Robust multi-episode RL runner, no legacy save/model logic; outer loop points to project dir runner.
Paul Huliganga
2026-04-13 00:28:45 -0400
c98bc7ef38Initial commit
Paul Huliganga
2026-04-12 23:44:36 -0400
2cadd1a78aInitial commit: stable RL sweep runner, legacy and new scripts, full docs included
Paul Huliganga
2026-04-12 22:57:50 -0400