Every test run now saves to agent/test-results/YYYY-MM-DD_HH-MM_<model>.log so results are never lost. Also added 3-set Exp9 eval results to TEST_HISTORY. Usage: python3 agent/run_eval.py --model models/exp9-.../best_model.zip --sets 3 Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A |
||
|---|---|---|
| .. | ||
| track-screenshots | ||
| ARCHITECTURE.md | ||
| RESEARCH_LOG.md | ||
| STATE.md | ||
| TEST_HISTORY.md | ||