Commit Graph

9 Commits

Author SHA1 Message Date
Paul Huliganga 6e2427571a docs: record failed cross-track warm-start transfer experiments exp15 and exp16 2026-04-20 20:18:08 -04:00
Paul Huliganga 0da04327ef docs: capture robust mountain finetune winner at 36k and preserve eval comparison 2026-04-20 00:43:27 -04:00
Paul Huliganga 0993d4f1e7 docs: Exp 11 + 11b results — parallel envs work, v6 prevents circles, but plateaus at ~194 steps
Exp 11 (v5 reward): aborted at 66k — circular driving returned without efficiency term
Exp 11b (v6 reward): completed 90k — no circles but plateaus at 170-195 steps
All 4 tracks eval: remarkably consistent ~194 steps (including zero-shot)
Parallel DummyVecEnv infrastructure proven stable.
Next: increase training budget (90k may be insufficient for 2 parallel envs).
2026-04-19 13:26:29 -04:00
Paul Huliganga db1274174f docs: Exp10 vs Exp9 vs Wave4 Trial 9 root cause analysis — random seed lottery 2026-04-19 10:29:16 -04:00
Paul Huliganga 3d04b53a86 docs: Exp10 eval results — total failure, crashes on all tracks (massive regression from Exp9/W4T9) 2026-04-19 10:19:16 -04:00
Paul Huliganga fecba1dd35 docs: TEST_HISTORY Exp10 plan added
Exp10: generated_track + mountain_track, v5 reward, throttle_min=0.2
Same as Exp9 but with visual diversity from second track.

Agent: pi
2026-04-18 17:59:07 -04:00
Paul Huliganga b19dcc8b80 feat: run_eval.py — standard eval runner with persistent logging
Every test run now saves to agent/test-results/YYYY-MM-DD_HH-MM_<model>.log
so results are never lost. Also added 3-set Exp9 eval results to TEST_HISTORY.

Usage:
  python3 agent/run_eval.py --model models/exp9-.../best_model.zip --sets 3

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-18 15:32:36 -04:00
Paul Huliganga eb4fd39056 docs: TEST_HISTORY updated with Exp8 results and Exp9 plan
Exp8 results: 567 reward peak at step 60k, policy diverged after.
Best_model correctly saved. mini_monaco crashed at 91 steps (mean)
at same corner every time — throttle min=0.5 baked into action space.

Exp9 plan: throttle_min=0.2, v5 reward unchanged. Tests hypothesis
that v5 gradient is sufficient for hill without forced 0.5 minimum.

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-18 13:40:45 -04:00
Paul Huliganga 041481916d docs: TEST_HISTORY.md — comprehensive record of all experiments
Every mountain track experiment (Exp1-8) and Wave 4 trials documented:
- What was changed from previous test
- Key observation from simulator
- Root cause of failure
- What was learned

Also documents: what we keep, open problems, next steps.
Exp 8 currently running (PID 2941877).

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-18 11:18:53 -04:00