Exp8 results: 567 reward peak at step 60k, policy diverged after. Best_model correctly saved. mini_monaco crashed at 91 steps (mean) at same corner every time — throttle min=0.5 baked into action space. Exp9 plan: throttle_min=0.2, v5 reward unchanged. Tests hypothesis that v5 gradient is sufficient for hill without forced 0.5 minimum. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A |
||
|---|---|---|
| .. | ||
| track-screenshots | ||
| ARCHITECTURE.md | ||
| RESEARCH_LOG.md | ||
| STATE.md | ||
| TEST_HISTORY.md | ||