The circle exploit persisted because the penalty alone (-100 per short lap) was insufficient. The model stayed alive between laps accumulating small positive rewards, making circling a viable strategy despite the penalty. Fix: _compute_reward_and_done() returns (reward, force_terminate). When a short lap is detected, force_terminate=True is returned and step() sets terminated=True immediately. The episode ends on the spot — no more rewards possible. This makes the circle exploit strictly worse than any forward driving behaviour. Tests updated: _compute_reward → _compute_reward_and_done, short-lap test now asserts force_terminate=True. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A |
||
|---|---|---|
| .harness | ||
| agent | ||
| docs | ||
| tests | ||
| .gitignore | ||
| AGENT.md | ||
| DECISIONS.md | ||
| IMPLEMENTATION_PLAN.md | ||
| PROJECT-KICKOFF.md | ||
| PROJECT-SPEC.md | ||
| README.md | ||
| create_gitea_repo.py | ||
| ralph-loop.sh | ||
README.md
donkeycar-rl-autoresearch
Purpose
Status
- Scaffolded with the agent harness
- Spec not filled yet
Runbook
- Fill PROJECT-SPEC.md
- Create IMPLEMENTATION_PLAN.md from the spec
- Start the implementation loop