Commit Graph

1 Commits

Author SHA1 Message Date
Paul Huliganga b19dcc8b80 feat: run_eval.py — standard eval runner with persistent logging
Every test run now saves to agent/test-results/YYYY-MM-DD_HH-MM_<model>.log
so results are never lost. Also added 3-set Exp9 eval results to TEST_HISTORY.

Usage:
  python3 agent/run_eval.py --model models/exp9-.../best_model.zip --sets 3

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-18 15:32:36 -04:00