Research-grade DonkeyCar RL autoresearch and sweep system.
Go to file
Paul Huliganga 4f77b8a468 fix: always save and return the BEST model, not the last one
This was the root cause of losing good models during training.
The model could learn to lap at step 30k then drift to a worse
policy by step 90k, and we only ever saved the final weights.

Changes to train_multitrack():
- Tracks best_segment_reward across all segments
- Saves best_model.zip whenever a new high score is achieved
- At end of training, RELOADS best_model.zip before returning
  so the caller always gets the best policy found, not the drift

Both files saved per trial:
  model.zip      <- latest checkpoint (crash recovery)
  best_model.zip <- best policy seen during training (used for eval)

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-17 14:45:37 -04:00
.harness feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing 2026-04-13 10:03:15 -04:00
agent fix: always save and return the BEST model, not the last one 2026-04-17 14:45:37 -04:00
docs docs: ARCHITECTURE.md — complete system architecture guide 2026-04-17 14:06:38 -04:00
tests feat: v5 reward — speed × CTE-quality, drop efficiency term 2026-04-17 13:25:38 -04:00
.gitignore feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing 2026-04-13 10:03:15 -04:00
AGENT.md feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing 2026-04-13 10:03:15 -04:00
DECISIONS.md docs: ADR-013 through ADR-016 — decisions that were lost to context compaction 2026-04-15 22:34:48 -04:00
IMPLEMENTATION_PLAN.md feat: Phase 3 — behavioral control, enhanced evaluator, 53 tests 2026-04-14 09:28:43 -04:00
PROJECT-KICKOFF.md feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing 2026-04-13 10:03:15 -04:00
PROJECT-SPEC.md feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing 2026-04-13 10:03:15 -04:00
README.md feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing 2026-04-13 10:03:15 -04:00
create_gitea_repo.py Initial commit 2026-04-12 23:44:36 -04:00
ralph-loop.sh feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing 2026-04-13 10:03:15 -04:00

README.md

donkeycar-rl-autoresearch

Purpose

Status

  • Scaffolded with the agent harness
  • Spec not filled yet

Runbook

  • Fill PROJECT-SPEC.md
  • Create IMPLEMENTATION_PLAN.md from the spec
  • Start the implementation loop