donkeycar-rl-autoresearch/agent
Paul Huliganga c8a495dd22 fix: reward v4 — full sim bypass kills circular driving at root
ROOT CAUSE:
  donkey_sim.py calc_reward() uses forward_vel = dot(heading, velocity).
  A spinning car ALWAYS has forward_vel > 0 (always moving 'forward' relative
  to its own heading), so it earned positive reward indefinitely while circling.

v3 WAS INSUFFICIENT:
  v3 applied efficiency only to the speed BONUS: original × (1 + speed×eff×scale)
  But 'original' from sim was still exploitable: CTE≈0 while spinning → original=1.0/step
  Efficiency killed the speed bonus but not the base reward.
  47k-step run: spinning = 1.0/step × 47k = 47k reward (never crashes in circle)

v4 FIX — base × efficiency × speed:
  reward = (1 - abs(cte)/max_cte) × efficiency × (1 + speed_scale × speed)
  Completely ignores sim's bogus forward_vel reward.
  Spinning (eff≈0): reward ≈ 0 regardless of CTE or speed.
  ALL three terms must be high to earn reward — cannot be gamed.

Key new test: test_circling_at_zero_cte_gives_near_zero_reward
  Worst-case exploit (CTE=0 spinning) → avg reward < 0.15 (was 1.0 in v3)
  forward_beats_circling_by_3x confirmed.

Also: update Phase 2 autoresearch timesteps test, research log updated.

Agent: pi/claude-sonnet
Tests: 40/40 passing
Tests-Added: +1 (core v4 circling guarantee)
TypeScript: N/A
2026-04-13 20:56:32 -04:00
..
__pycache__ Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
bin Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
models autoresearch: phase1 trial 10 results 2026-04-13 13:11:06 -04:00
outerloop-results fix: reward v4 — full sim bypass kills circular driving at root 2026-04-13 20:56:32 -04:00
sb3-models Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
sessions Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
AUTORESEARCH_README.txt AUTORESEARCH: Full Karpathy-style GP+UCB meta-controller, clean base data, fixed all paths, ready to run 2026-04-13 00:52:00 -04:00
README.md Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
SETUP_QUICKSTART.md Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
TROUBLESHOOT.md Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
analysis_circular_driving.py fix: path-efficiency reward (v3) defeats circular driving exploit 2026-04-13 13:36:17 -04:00
auth.json Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
autoresearch Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
autoresearch_controller.py milestone: Phase 1 complete — genuine driving confirmed; launch Phase 2 corner learning 2026-04-13 19:33:06 -04:00
champion_tracker.py feat: Wave 1 complete — real PPO training, model save, GP+UCB autoresearch, 37 tests passing 2026-04-13 10:03:15 -04:00
check_envs.py Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
choose_and_run_track.py Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
discretize_action.py Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
donkeycar_autoresearch.py Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
donkeycar_outer_loop.py AUTORESEARCH: Full Karpathy-style GP+UCB meta-controller, clean base data, fixed all paths, ready to run 2026-04-13 00:52:00 -04:00
donkeycar_sb3_runner.py fix: reduce timesteps to 1k-5k for Phase 1 CPU training; add sim health/stuck detection; fix PPO throttle clamp 2026-04-13 11:17:08 -04:00
donkeycar_sb3_runner.py.README.txt Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
evaluate_champion.py milestone: Phase 1 complete — genuine driving confirmed; launch Phase 2 corner learning 2026-04-13 19:33:06 -04:00
list_tracks.py Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
manual-multiepisode-2.log Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
manual-multiepisode-3.log Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
manual-multiepisode-batch.log Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
manual-multiepisode.log Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
manual_multiepisode-batch-run.log Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
manual_multiepisode_batch.sh Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
reward_wrapper.py fix: reward v4 — full sim bypass kills circular driving at root 2026-04-13 20:56:32 -04:00
run_all_known_tracks.py Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
run_circuit_launch.py Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
run_donkeycar_test.sh Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
settings.json Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
test_donkeycar.py Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
test_donkeycar_gymnasium.py Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00
test_sdsandbox.py Initial commit: stable RL sweep runner, legacy and new scripts, full docs included 2026-04-12 22:57:50 -04:00

README.md

DonkeyCar RL Batch Automation and Setup Guide

System Requirements

  • Windows 10/11 machine (with DonkeyCar Unity Simulator installed)
  • WSL2 with Ubuntu (Python 3.x installed)
  • DonkeyCar Unity Simulator running in Windows, Remote Control enabled

1. WSL (Linux) Setup

A. Install Python, pip, and core packages

sudo apt update && sudo apt install -y python3-pip git
pip3 install --upgrade pip
pip3 install stable-baselines3 gymnasium gym-donkeycar numpy matplotlib
# If pip install gym-donkeycar fails, use:
pip3 install git+https://github.com/tawnkramer/gym-donkeycar.git

B. Place scripts

Copy these into /home/paulh/.pi/agent/:

  • donkeycar_sb3_runner.py
  • manual_multiepisode_batch.sh (and make executable)
  • Any grid/outer loop script (donkeycar_outer_loop.py, as needed)

2. Unity DonkeyCar Simulator Setup (Windows)

  • Download/install Unity DonkeyCar sim (from DonkeyCar/tawnkramer Github or releases page)
  • Open simulator, select "Donkey Generated Track" and enable Remote (SocketAPI) mode
  • Ensure port 9091 is listening (default); leave sim running and visible

3. Running Robust Batches (Best Practice)

Clean multi-episode approach:

Do NOT rapidly kill/restart agents. Use scripts that:

  • Open one connection, run multiple episodes using env.reset() in a loop
  • Cleanly call env.close()
  • Batch script launches process, waits a couple seconds, repeats

Sample batch script (manual_multiepisode_batch.sh):

#!/bin/bash
for i in {1..20}; do
  echo "===== RUN $i ===== $(date)" | tee -a manual-multiepisode-batch.log
  python3 donkeycar_sb3_runner.py >> manual-multiepisode-batch.log 2>&1
  echo "===== END RUN $i ===== $(date)" | tee -a manual-multiepisode-batch.log
  sleep 2
done

Make executable:

chmod +x manual_multiepisode_batch.sh

Run in background:

nohup ./manual_multiepisode_batch.sh &

4. Debugging/Recovery

  • If sim blanks/hangs, first check for stuck agent processes: ps aux | grep donkeycar_sb3_runner
  • Always allow scripts to call env.close()
  • If hang persists: restart Unity DonkeyCar Sim in Windows

5. Automation Flow Summary

Step Where Details
Install WSL/Linux pip install ...
Prepare WSL/Linux Place scripts, ensure Python dependencies
Sim start Windows Start DonkeyCar Unity Sim
Run batch WSL/Linux nohup ./manual_multiepisode_batch.sh ...
Monitor Both Car resets, batch log grows, check hangs

See included Python scripts for reusable RL runner and grid search outer loop logic.