donkeycar-rl-autoresearch

Commit Graph

Author	SHA1	Message	Date
Paul Huliganga	0c6263352b	autoresearch: phase1 trial 10 results Agent: pi Tests: N/A Tests-Added: 0 TypeScript: N/A	2026-04-13 12:01:17 -04:00
Paul Huliganga	8c9fd76c68	fix: reduce timesteps to 1k-5k for Phase 1 CPU training; add sim health/stuck detection; fix PPO throttle clamp Problems fixed: - Timesteps 5k-30k caused all trials to timeout (PPO+CNN+CPU needs ~0.1s/step) - New range: 1000-5000 steps fits well within 480s timeout - PPO random init policy outputs throttle~0 -> car sits still -> fix with ThrottleClampWrapper (min 0.2) - Sim stuck detection: if speed<0.02 for 100 consecutive steps, stop training and report error - Sim frozen detection: if observation unchanged for 30 steps, stop training (connection lost) - eval_episodes reduced to 3 to speed up evaluation phase Agent: pi/claude-sonnet Tests: 37/37 passing Tests-Added: 0 (behaviour change only) TypeScript: N/A	2026-04-13 11:17:08 -04:00
Paul Huliganga	c804189dd0	feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing - Rebuilt donkeycar_sb3_runner.py: real PPO/DQN model.learn() + evaluate_policy() + model.save() - Added SpeedRewardWrapper: reward = speed * (1 - \|cte\|/max_cte) - Added ChampionTracker: tracks best model across all trials, writes manifest.json - Rebuilt autoresearch_controller.py: Phase 1 results separated from random-policy data - Added timesteps to GP search space - Added --push-every N for automatic git push - Added 37 passing tests: discretize_action, reward_wrapper, autoresearch_controller, runner_integration - Scaffolded project with agent harness (large mode): PROJECT-SPEC, DECISIONS, IMPLEMENTATION_PLAN, EXECUTION_MASTER - Fixed: model.save() never called before model is defined (was root cause of all prior NameError crashes) - Fixed: random policy replaced with real trained policy evaluation Agent: pi/claude-sonnet Tests: 37/37 passing Tests-Added: +37 TypeScript: N/A	2026-04-13 10:03:15 -04:00
Paul Huliganga	083326a497	AUTORESEARCH: 300 total trials complete - best mean_reward=141.85 at n_steer=8, n_throttle=5, lr=0.00202	2026-04-13 01:56:06 -04:00
Paul Huliganga	3446e5f7c1	AUTORESEARCH: 100 trials complete - best mean_reward=114.56 at n_steer=8, n_throttle=4, lr=0.00208	2026-04-13 01:13:20 -04:00
Paul Huliganga	bb9e6d9105	AUTORESEARCH: Full Karpathy-style GP+UCB meta-controller, clean base data, fixed all paths, ready to run	2026-04-13 00:52:00 -04:00
Paul Huliganga	2cadd1a78a	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00

1 2

57 Commits