Commit Graph

2 Commits

Author SHA1 Message Date
Paul Huliganga 138c65270f feat(exp22): add solid-hit/wedge/high-CTE exploit fixes and generated-pair warm experiments
- reward_wrapper: detect barrier/wall/tree solid hits, terminate on head-on impact
  or 4 sustained solid-hit frames; prevents car wedging against invisible barriers
- reward_wrapper: add low-speed/wedge termination — kills episode when car is pinned
  motionless (below threshold, no displacement) after grace period
- reward_wrapper: high-CTE exploit fix — return -0.25 immediately when CTE >
  max_cte_terminate (not after patience), so PPO cannot collect positive speed
  rewards while driving the large outside-road circle
- tests: 23 passing unit tests covering all new termination paths
- exp20/21/22: add parallel DummyVecEnv experiments on generated_road+generated_track
  with warm-start from champion model; exp22 is current active run
- SESSION_HANDOFF.md: live handoff doc for next session continuity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 14:46:13 -04:00
Paul Huliganga c804189dd0 feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing
- Rebuilt donkeycar_sb3_runner.py: real PPO/DQN model.learn() + evaluate_policy() + model.save()
- Added SpeedRewardWrapper: reward = speed * (1 - |cte|/max_cte)
- Added ChampionTracker: tracks best model across all trials, writes manifest.json
- Rebuilt autoresearch_controller.py: Phase 1 results separated from random-policy data
- Added timesteps to GP search space
- Added --push-every N for automatic git push
- Added 37 passing tests: discretize_action, reward_wrapper, autoresearch_controller, runner_integration
- Scaffolded project with agent harness (large mode): PROJECT-SPEC, DECISIONS, IMPLEMENTATION_PLAN, EXECUTION_MASTER
- Fixed: model.save() never called before model is defined (was root cause of all prior NameError crashes)
- Fixed: random policy replaced with real trained policy evaluation

Agent: pi/claude-sonnet
Tests: 37/37 passing
Tests-Added: +37
TypeScript: N/A
2026-04-13 10:03:15 -04:00