Every test run now saves to agent/test-results/YYYY-MM-DD_HH-MM_<model>.log
so results are never lost. Also added 3-set Exp9 eval results to TEST_HISTORY.
Usage:
python3 agent/run_eval.py --model models/exp9-.../best_model.zip --sets 3
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
Exp8 results: 567 reward peak at step 60k, policy diverged after.
Best_model correctly saved. mini_monaco crashed at 91 steps (mean)
at same corner every time — throttle min=0.5 baked into action space.
Exp9 plan: throttle_min=0.2, v5 reward unchanged. Tests hypothesis
that v5 gradient is sufficient for hill without forced 0.5 minimum.
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
Every mountain track experiment (Exp1-8) and Wave 4 trials documented:
- What was changed from previous test
- Key observation from simulator
- Root cause of failure
- What was learned
Also documents: what we keep, open problems, next steps.
Exp 8 currently running (PID 2941877).
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A