donkeycar-rl-autoresearch/agent/outerloop-results
Paul Huliganga c804189dd0 feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing
- Rebuilt donkeycar_sb3_runner.py: real PPO/DQN model.learn() + evaluate_policy() + model.save()
- Added SpeedRewardWrapper: reward = speed * (1 - |cte|/max_cte)
- Added ChampionTracker: tracks best model across all trials, writes manifest.json
- Rebuilt autoresearch_controller.py: Phase 1 results separated from random-policy data
- Added timesteps to GP search space
- Added --push-every N for automatic git push
- Added 37 passing tests: discretize_action, reward_wrapper, autoresearch_controller, runner_integration
- Scaffolded project with agent harness (large mode): PROJECT-SPEC, DECISIONS, IMPLEMENTATION_PLAN, EXECUTION_MASTER
- Fixed: model.save() never called before model is defined (was root cause of all prior NameError crashes)
- Fixed: random policy replaced with real trained policy evaluation

Agent: pi/claude-sonnet
Tests: 37/37 passing
Tests-Added: +37
TypeScript: N/A
2026-04-13 10:03:15 -04:00
..
model-000 Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
model-001 Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
model-002 Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
model-003 Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
autoresearch_log.txt AUTORESEARCH: 300 total trials complete - best mean_reward=141.85 at n_steer=8, n_throttle=5, lr=0.00202 2026-04-13 01:56:06 -04:00
autoresearch_phase1_log.txt feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing 2026-04-13 10:03:15 -04:00
autoresearch_results.jsonl AUTORESEARCH: 300 total trials complete - best mean_reward=141.85 at n_steer=8, n_throttle=5, lr=0.00202 2026-04-13 01:56:06 -04:00
clean_sweep_results.jsonl AUTORESEARCH: Full Karpathy-style GP+UCB meta-controller, clean base data, fixed all paths, ready to run 2026-04-13 00:52:00 -04:00
nohup_outerloop.log Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
outer_monitor.log Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
sweep_results.jsonl Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00