Two reward hacking behaviours observed during Wave 4 training: 1. Short-lap circle exploit (reported by user, echoes Toni's guardrail hack): Model circles at start/finish line completing laps in 1-2 sim-seconds, accumulating lap_count indefinitely with no genuine track progress. Fix: SpeedRewardWrapper detects lap_count increment; if last_lap_time < min_lap_time (5.0s), returns penalty = -10 × (min_lap_time / lap_time). A 1-second lap gives -50 penalty. Legitimate 12-second laps unaffected. Window size also increased from 30 → 60 to catch slower circles. 2. Non-terminating segment eval episodes: evaluate_policy on wide tracks (no barriers) could run indefinitely, inflating segment_reward to 200k+. Replaced with manual eval loop capped at MAX_EVAL_STEPS=3000 steps. Phase 4 results cleared (trials 4-6 ran with exploitable reward). Tests: 4 new reward wrapper tests, 100 total passing. Agent: pi Tests: 100 passed Tests-Added: 4 TypeScript: N/A |
||
|---|---|---|
| .harness | ||
| agent | ||
| docs | ||
| tests | ||
| .gitignore | ||
| AGENT.md | ||
| DECISIONS.md | ||
| IMPLEMENTATION_PLAN.md | ||
| PROJECT-KICKOFF.md | ||
| PROJECT-SPEC.md | ||
| README.md | ||
| create_gitea_repo.py | ||
| ralph-loop.sh | ||
README.md
donkeycar-rl-autoresearch
Purpose
Status
- Scaffolded with the agent harness
- Spec not filled yet
Runbook
- Fill PROJECT-SPEC.md
- Create IMPLEMENTATION_PLAN.md from the spec
- Start the implementation loop