Commit Graph

4 Commits

Author SHA1 Message Date
Paul Huliganga fc01057c14 docs: ADR-017 — always save best model, never just latest
Documents the root cause of losing the mountain_track model that was
doing 20-second laps at step 30k but crashed at step 90k final eval.

Phase 2 (13k steps, simple track): final = best. Assumption carried
forward incorrectly into Wave 4 (90k steps, policy can drift).

Mandatory rule: every training script uses train_multitrack() best_model
tracking OR SB3 EvalCallback. No exceptions.

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-17 16:03:59 -04:00
Paul Huliganga ff8bdd8b8a docs: ADR-013 through ADR-016 — decisions that were lost to context compaction
ADR-013: Wave 4 train-from-scratch rationale (why no warm-start, why
         generated_track+mountain_track, proven by 1943 overnight result)
ADR-014: Measure throughput before long runs (10+ hours lost to timeouts)
ADR-015: Per-segment checkpointing is non-negotiable
ADR-016: Verify fixes are running before walking away

These decisions existed in conversation but were never written down,
causing them to be forgotten after context compaction and re-learned
the hard way multiple times.

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-15 22:34:48 -04:00
Paul Huliganga 4ca5304a71 wave3: add multi-track autoresearch system (83 tests passing)
New files:
- agent/multitrack_runner.py: trains PPO round-robin across generated_road,
  generated_track, mountain_track; zero-shot evaluates on mini_monaco + warren
- agent/wave3_controller.py: GP+UCB outer loop optimising combined test score
- tests/test_wave3.py: 30 new tests (83 total)

Track classification (from visual analysis of all 10 screenshots):
  Training  : generated_road, generated_track, mountain_track
  Test (ZSL): mini_monaco, warren (pseudo-outdoor — proper road markings)
  Skip      : warehouse, robo_racing_league, waveshare, circuit_launch (indoor floor)
              avc_sparkfun (orange markings — different visual domain)

Key design decisions:
  ADR-010: Warren = pseudo-outdoor track (proper road lines, not floor marks)
  ADR-011: Test tracks NEVER used in training; GP optimises test score only
  ADR-012: All trials warm-start from Phase 2 champion model
  Switching: env.close() + send_exit_scene_raw() + 4s wait + gym.make()

Pre-Wave-3 baseline: 1/10 tracks drivable (0/2 held-out test tracks)
Wave 3 goal: 2/2 test tracks drivable (mini_monaco + warren)

Agent: pi
Tests: 83 passed
Tests-Added: 30
TypeScript: N/A
2026-04-14 12:47:12 -04:00
Paul Huliganga c804189dd0 feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing
- Rebuilt donkeycar_sb3_runner.py: real PPO/DQN model.learn() + evaluate_policy() + model.save()
- Added SpeedRewardWrapper: reward = speed * (1 - |cte|/max_cte)
- Added ChampionTracker: tracks best model across all trials, writes manifest.json
- Rebuilt autoresearch_controller.py: Phase 1 results separated from random-policy data
- Added timesteps to GP search space
- Added --push-every N for automatic git push
- Added 37 passing tests: discretize_action, reward_wrapper, autoresearch_controller, runner_integration
- Scaffolded project with agent harness (large mode): PROJECT-SPEC, DECISIONS, IMPLEMENTATION_PLAN, EXECUTION_MASTER
- Fixed: model.save() never called before model is defined (was root cause of all prior NameError crashes)
- Fixed: random policy replaced with real trained policy evaluation

Agent: pi/claude-sonnet
Tests: 37/37 passing
Tests-Added: +37
TypeScript: N/A
2026-04-13 10:03:15 -04:00