History

Paul Huliganga 4f77b8a468 fix: always save and return the BEST model, not the last one This was the root cause of losing good models during training. The model could learn to lap at step 30k then drift to a worse policy by step 90k, and we only ever saved the final weights. Changes to train_multitrack(): - Tracks best_segment_reward across all segments - Saves best_model.zip whenever a new high score is achieved - At end of training, RELOADS best_model.zip before returning so the caller always gets the best policy found, not the drift Both files saved per trial: model.zip <- latest checkpoint (crash recovery) best_model.zip <- best policy seen during training (used for eval) Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A		2026-04-17 14:45:37 -04:00
..
__pycache__	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
bin	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
models	wave3: autoresearch trial 5 results	2026-04-15 07:15:57 -04:00
outerloop-results	fix: always save and return the BEST model, not the last one	2026-04-17 14:45:37 -04:00
sb3-models	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
sessions	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
AUTORESEARCH_README.txt	AUTORESEARCH: Full Karpathy-style GP+UCB meta-controller, clean base data, fixed all paths, ready to run	2026-04-13 00:52:00 -04:00
README.md	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
SETUP_QUICKSTART.md	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
TROUBLESHOOT.md	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
analysis_circular_driving.py	fix: path-efficiency reward (v3) defeats circular driving exploit	2026-04-13 13:36:17 -04:00
auth.json	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
autoresearch	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
autoresearch_controller.py	milestone: Phase 1 complete — genuine driving confirmed; launch Phase 2 corner learning	2026-04-13 19:33:06 -04:00
behavioral_wrappers.py	feat: Phase 3 — behavioral control, enhanced evaluator, 53 tests	2026-04-14 09:28:43 -04:00
champion_tracker.py	feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing	2026-04-13 10:03:15 -04:00
check_envs.py	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
choose_and_run_track.py	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
discretize_action.py	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
donkeycar_autoresearch.py	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
donkeycar_outer_loop.py	AUTORESEARCH: Full Karpathy-style GP+UCB meta-controller, clean base data, fixed all paths, ready to run	2026-04-13 00:52:00 -04:00
donkeycar_sb3_runner.py	fix: reduce timesteps to 1k-5k for Phase 1 CPU training; add sim health/stuck detection; fix PPO throttle clamp	2026-04-13 11:17:08 -04:00
donkeycar_sb3_runner.py.README.txt	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
eval_on_track.py	feat: v5 reward — speed × CTE-quality, drop efficiency term	2026-04-17 13:25:38 -04:00
evaluate_champion.py	fix: track switching via unwrapped viewer.exit_scene() — automatic scene changes work	2026-04-14 10:04:15 -04:00
list_tracks.py	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
manual-multiepisode-2.log	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
manual-multiepisode-3.log	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
manual-multiepisode-batch.log	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
manual-multiepisode.log	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
manual_multiepisode-batch-run.log	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
manual_multiepisode_batch.sh	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
multitrack_eval.py	feat: comprehensive multi-track evaluation script + research log updates	2026-04-14 10:11:47 -04:00
multitrack_runner.py	fix: always save and return the BEST model, not the last one	2026-04-17 14:45:37 -04:00
park_at_start.py	wave3: add multi-track autoresearch system (83 tests passing)	2026-04-14 12:47:12 -04:00
reward_wrapper.py	feat: v5 reward — speed × CTE-quality, drop efficiency term	2026-04-17 13:25:38 -04:00
run_all_known_tracks.py	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
run_circuit_launch.py	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
run_donkeycar_test.sh	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
settings.json	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
test_donkeycar.py	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
test_donkeycar_gymnasium.py	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
test_sdsandbox.py	Initial commit: stable RL sweep runner, legacy and new scripts, full docs included	2026-04-12 22:57:50 -04:00
track_switcher.py	fix: track switching via unwrapped viewer.exit_scene() — automatic scene changes work	2026-04-14 10:04:15 -04:00
wave3_controller.py	fix: complete LR override — must patch lr_schedule, not just param_groups	2026-04-14 21:27:43 -04:00
wave4_controller.py	fix: prevent trial timeouts losing all data	2026-04-15 21:54:50 -04:00

README.md

DonkeyCar RL Batch Automation and Setup Guide

System Requirements

Windows 10/11 machine (with DonkeyCar Unity Simulator installed)
WSL2 with Ubuntu (Python 3.x installed)
DonkeyCar Unity Simulator running in Windows, Remote Control enabled

1. WSL (Linux) Setup

A. Install Python, pip, and core packages

sudo apt update && sudo apt install -y python3-pip git
pip3 install --upgrade pip
pip3 install stable-baselines3 gymnasium gym-donkeycar numpy matplotlib
# If pip install gym-donkeycar fails, use:
pip3 install git+https://github.com/tawnkramer/gym-donkeycar.git

B. Place scripts

Copy these into /home/paulh/.pi/agent/:

donkeycar_sb3_runner.py
manual_multiepisode_batch.sh (and make executable)
Any grid/outer loop script (donkeycar_outer_loop.py, as needed)

2. Unity DonkeyCar Simulator Setup (Windows)

Download/install Unity DonkeyCar sim (from DonkeyCar/tawnkramer Github or releases page)
Open simulator, select "Donkey Generated Track" and enable Remote (SocketAPI) mode
Ensure port 9091 is listening (default); leave sim running and visible

3. Running Robust Batches (Best Practice)

Clean multi-episode approach:

Do NOT rapidly kill/restart agents. Use scripts that:

Open one connection, run multiple episodes using env.reset() in a loop
Cleanly call env.close()
Batch script launches process, waits a couple seconds, repeats

Sample batch script (`manual_multiepisode_batch.sh`):

#!/bin/bash
for i in {1..20}; do
  echo "===== RUN $i ===== $(date)" | tee -a manual-multiepisode-batch.log
  python3 donkeycar_sb3_runner.py >> manual-multiepisode-batch.log 2>&1
  echo "===== END RUN $i ===== $(date)" | tee -a manual-multiepisode-batch.log
  sleep 2
done

Make executable:

chmod +x manual_multiepisode_batch.sh

Run in background:

nohup ./manual_multiepisode_batch.sh &

4. Debugging/Recovery

If sim blanks/hangs, first check for stuck agent processes: ps aux | grep donkeycar_sb3_runner
Always allow scripts to call env.close()
If hang persists: restart Unity DonkeyCar Sim in Windows

5. Automation Flow Summary

Step	Where	Details
Install	WSL/Linux	pip install ...
Prepare	WSL/Linux	Place scripts, ensure Python dependencies
Sim start	Windows	Start DonkeyCar Unity Sim
Run batch	WSL/Linux	nohup ./manual_multiepisode_batch.sh ...
Monitor	Both	Car resets, batch log grows, check hangs

See included Python scripts for reusable RL runner and grid search outer loop logic.