donkeycar-rl-autoresearch/agent/outerloop-results
Paul Huliganga 26251c7d0c results: complete multi-track generalization baseline — 1/10 tracks drivable pre-Wave3
RESULTS:
  T20 (champion):  Generated Road only (1/10 tracks)
  T08:             Generated Road only (1/10 tracks)
  T18:             All tracks crash (0/10) — even new Generated Road layout!

  Robo Racing League: best unseen result (116 steps) — visual similarity to generated_road?
  Thunderhill: not available in this simulator version

KEY FINDING: Models are visually overfit to generated_road CNN features.
All unseen tracks crash within 40-116 steps (vs 2200+ on trained track).
This is the expected Phase 2→3 transition point.

WAVE 3 STRATEGY (documented in RESEARCH_LOG.md):
  Stage 1: generated_road ↔ generated_track (same geometry, different visuals)
  Stage 2: + mountain_track (different geometry)
  Stage 3: all tracks rotation (true generalization)

Also fixed: multitrack_eval.py updated with only valid scene names
(thunderhill removed — not in this simulator version)

Agent: pi/claude-sonnet
Tests: 53/53 passing
TypeScript: N/A
2026-04-14 11:31:08 -04:00
..
model-000 Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
model-001 Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
model-002 Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
model-003 Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
autoresearch_log.txt AUTORESEARCH: 300 total trials complete - best mean_reward=141.85 at n_steer=8, n_throttle=5, lr=0.00202 2026-04-13 01:56:06 -04:00
autoresearch_phase1_log.txt milestone: Phase 1 complete — genuine driving confirmed; launch Phase 2 corner learning 2026-04-13 19:33:06 -04:00
autoresearch_phase1_log_CORRUPTED_circular_driving.txt fix: path-efficiency reward (v3) defeats circular driving exploit 2026-04-13 13:36:17 -04:00
autoresearch_phase1_log_CORRUPTED_reward_hacking.txt fix: hack-proof reward shaping + reward hacking detection + research log 2026-04-13 12:27:48 -04:00
autoresearch_phase2_log.txt feat: Phase 3 — behavioral control, enhanced evaluator, 53 tests 2026-04-14 09:28:43 -04:00
autoresearch_results.jsonl AUTORESEARCH: 300 total trials complete - best mean_reward=141.85 at n_steer=8, n_throttle=5, lr=0.00202 2026-04-13 01:56:06 -04:00
autoresearch_results_phase1.jsonl autoresearch: phase1 trial 50 results 2026-04-13 19:17:56 -04:00
autoresearch_results_phase1_CORRUPTED_circular_driving.jsonl fix: path-efficiency reward (v3) defeats circular driving exploit 2026-04-13 13:36:17 -04:00
autoresearch_results_phase1_CORRUPTED_reward_hacking.jsonl fix: hack-proof reward shaping + reward hacking detection + research log 2026-04-13 12:27:48 -04:00
autoresearch_results_phase2.jsonl autoresearch: phase1 trial 20 results 2026-04-14 04:35:45 -04:00
clean_sweep_results.jsonl AUTORESEARCH: Full Karpathy-style GP+UCB meta-controller, clean base data, fixed all paths, ready to run 2026-04-13 00:52:00 -04:00
eval_summary.jsonl fix: track switching via unwrapped viewer.exit_scene() — automatic scene changes work 2026-04-14 10:04:15 -04:00
multitrack_results.jsonl results: complete multi-track generalization baseline — 1/10 tracks drivable pre-Wave3 2026-04-14 11:31:08 -04:00
nohup_outerloop.log Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
outer_monitor.log Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
sweep_results.jsonl Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00