Commit Graph

4 Commits

Author SHA1 Message Date
Paul Huliganga fecba1dd35 docs: TEST_HISTORY Exp10 plan added
Exp10: generated_track + mountain_track, v5 reward, throttle_min=0.2
Same as Exp9 but with visual diversity from second track.

Agent: pi
2026-04-18 17:59:07 -04:00
Paul Huliganga b19dcc8b80 feat: run_eval.py — standard eval runner with persistent logging
Every test run now saves to agent/test-results/YYYY-MM-DD_HH-MM_<model>.log
so results are never lost. Also added 3-set Exp9 eval results to TEST_HISTORY.

Usage:
  python3 agent/run_eval.py --model models/exp9-.../best_model.zip --sets 3

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-18 15:32:36 -04:00
Paul Huliganga eb4fd39056 docs: TEST_HISTORY updated with Exp8 results and Exp9 plan
Exp8 results: 567 reward peak at step 60k, policy diverged after.
Best_model correctly saved. mini_monaco crashed at 91 steps (mean)
at same corner every time — throttle min=0.5 baked into action space.

Exp9 plan: throttle_min=0.2, v5 reward unchanged. Tests hypothesis
that v5 gradient is sufficient for hill without forced 0.5 minimum.

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-18 13:40:45 -04:00
Paul Huliganga 041481916d docs: TEST_HISTORY.md — comprehensive record of all experiments
Every mountain track experiment (Exp1-8) and Wave 4 trials documented:
- What was changed from previous test
- Key observation from simulator
- Root cause of failure
- What was learned

Also documents: what we keep, open problems, next steps.
Exp 8 currently running (PID 2941877).

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-18 11:18:53 -04:00