eval_best_models.py: evaluates exp24/25/26 best models across 10 fixed random roads (regen_road with fixed seeds) for fair head-to-head comparison. eval_gentrack_on_minimonaco.py: zero-shot evaluation of gentrack specialists (exp13, wave5-gentrack-only, wave4-trial-0009) on mini-monaco. Results: exp26 > exp25 > exp24 on random roads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .harness | ||
| agent | ||
| docs | ||
| tests | ||
| .gitignore | ||
| AGENT.md | ||
| CLAUDE.md | ||
| DECISIONS.md | ||
| IMPLEMENTATION_PLAN.md | ||
| PROJECT-KICKOFF.md | ||
| PROJECT-SPEC.md | ||
| README.md | ||
| create_gitea_repo.py | ||
| monitor_training.sh | ||
| ralph-loop.sh | ||
README.md
donkeycar-rl-autoresearch
Purpose
Status
- Scaffolded with the agent harness
- Spec not filled yet
Runbook
- Fill PROJECT-SPEC.md
- Create IMPLEMENTATION_PLAN.md from the spec
- Start the implementation loop