Commit Graph

8 Commits

Author SHA1 Message Date
Paul Huliganga 4ca5304a71 wave3: add multi-track autoresearch system (83 tests passing)
New files:
- agent/multitrack_runner.py: trains PPO round-robin across generated_road,
  generated_track, mountain_track; zero-shot evaluates on mini_monaco + warren
- agent/wave3_controller.py: GP+UCB outer loop optimising combined test score
- tests/test_wave3.py: 30 new tests (83 total)

Track classification (from visual analysis of all 10 screenshots):
  Training  : generated_road, generated_track, mountain_track
  Test (ZSL): mini_monaco, warren (pseudo-outdoor — proper road markings)
  Skip      : warehouse, robo_racing_league, waveshare, circuit_launch (indoor floor)
              avc_sparkfun (orange markings — different visual domain)

Key design decisions:
  ADR-010: Warren = pseudo-outdoor track (proper road lines, not floor marks)
  ADR-011: Test tracks NEVER used in training; GP optimises test score only
  ADR-012: All trials warm-start from Phase 2 champion model
  Switching: env.close() + send_exit_scene_raw() + 4s wait + gym.make()

Pre-Wave-3 baseline: 1/10 tracks drivable (0/2 held-out test tracks)
Wave 3 goal: 2/2 test tracks drivable (mini_monaco + warren)

Agent: pi
Tests: 83 passed
Tests-Added: 30
TypeScript: N/A
2026-04-14 12:47:12 -04:00
Paul Huliganga e68d618d29 feat: Phase 3 — behavioral control, enhanced evaluator, 53 tests
PHASE 2 MILESTONE DOCUMENTED:
  All 3 top models complete the full track with distinct driving styles:
  - Trial 20 (n_steer=3): Right lane, stable steering — CHAMPION 
  - Trial 8  (n_steer=4): Left/center lane, oscillating (still completes!)
  - Trial 18 (n_steer=3): Right shoulder, very accurate line following
  Key finding: fewer steering bins (n_steer=3) = better driving (counterintuitive)
  CTE symmetry explains left/right preference: random NN init determines which side

BEHAVIORAL REWARD WRAPPERS (agent/behavioral_wrappers.py):
  - LanePositionWrapper: target a specific CTE offset (control left/right preference)
  - AntiOscillationWrapper: penalise rapid steering changes (fix Model 2 oscillation)
  - AsymmetricCTEWrapper: enforce right-lane rule (penalise left-of-centre more)
  - CombinedBehavioralWrapper: all three combined in one wrapper

ENHANCED EVALUATOR (agent/evaluate_champion.py):
  - Full metrics: reward, lap time, oscillation score, CTE distribution, lane position
  - --compare flag: runs all top Phase 2 models side by side with comparison table
  - Saves eval summary to outerloop-results/eval_summary.jsonl
  - Detects lap completion events from sim info dict

IMPLEMENTATION PLAN updated: Wave 3 streams defined
RESEARCH LOG updated: Phase 2 milestone, behavioral analysis, next steps
Champion updated to Trial 20 (Phase 2)

Agent: pi/claude-sonnet
Tests: 53/53 passing (+13 behavioral wrapper tests)
Tests-Added: +13
TypeScript: N/A
2026-04-14 09:28:43 -04:00
Paul Huliganga cfd1f843a4 autoresearch: phase1 trial 20 results
Agent: pi
Tests: N/A
Tests-Added: 0
TypeScript: N/A
2026-04-14 04:35:49 -04:00
Paul Huliganga 5114a95a74 autoresearch: phase1 trial 20 results
Agent: pi
Tests: N/A
Tests-Added: 0
TypeScript: N/A
2026-04-14 04:35:45 -04:00
Paul Huliganga 52b8a4a10e autoresearch: phase1 trial 15 results
Agent: pi
Tests: N/A
Tests-Added: 0
TypeScript: N/A
2026-04-14 02:56:38 -04:00
Paul Huliganga 6c8c5b25a9 autoresearch: phase1 trial 10 results
Agent: pi
Tests: N/A
Tests-Added: 0
TypeScript: N/A
2026-04-14 00:56:14 -04:00
Paul Huliganga 2d6fe2c962 autoresearch: phase1 trial 5 results
Agent: pi
Tests: N/A
Tests-Added: 0
TypeScript: N/A
2026-04-13 22:46:54 -04:00
Paul Huliganga c8a495dd22 fix: reward v4 — full sim bypass kills circular driving at root
ROOT CAUSE:
  donkey_sim.py calc_reward() uses forward_vel = dot(heading, velocity).
  A spinning car ALWAYS has forward_vel > 0 (always moving 'forward' relative
  to its own heading), so it earned positive reward indefinitely while circling.

v3 WAS INSUFFICIENT:
  v3 applied efficiency only to the speed BONUS: original × (1 + speed×eff×scale)
  But 'original' from sim was still exploitable: CTE≈0 while spinning → original=1.0/step
  Efficiency killed the speed bonus but not the base reward.
  47k-step run: spinning = 1.0/step × 47k = 47k reward (never crashes in circle)

v4 FIX — base × efficiency × speed:
  reward = (1 - abs(cte)/max_cte) × efficiency × (1 + speed_scale × speed)
  Completely ignores sim's bogus forward_vel reward.
  Spinning (eff≈0): reward ≈ 0 regardless of CTE or speed.
  ALL three terms must be high to earn reward — cannot be gamed.

Key new test: test_circling_at_zero_cte_gives_near_zero_reward
  Worst-case exploit (CTE=0 spinning) → avg reward < 0.15 (was 1.0 in v3)
  forward_beats_circling_by_3x confirmed.

Also: update Phase 2 autoresearch timesteps test, research log updated.

Agent: pi/claude-sonnet
Tests: 40/40 passing
Tests-Added: +1 (core v4 circling guarantee)
TypeScript: N/A
2026-04-13 20:56:32 -04:00