# DonkeyCar RL Autoresearch - README # =================================== # # QUICK START (after simulator is running): # # cd /home/paulh/projects/donkeycar-rl-autoresearch/agent # python3 autoresearch_controller.py --trials 100 # # The autoresearch will: # 1. Load all base sweep data (clean_sweep_results.jsonl) # 2. Fit a Gaussian Process surrogate model on reward-vs-params # 3. Use UCB (Upper Confidence Bound) to propose next best params # 4. Launch RL jobs automatically via the robust runner # 5. Record all results to outerloop-results/autoresearch_results.jsonl # 6. Repeat for --trials iterations, learning as it goes # # You can stop at any time with Ctrl+C. # Restart and it automatically picks up all prior results. # # LOGS: # outerloop-results/autoresearch_log.txt - human-readable log # outerloop-results/autoresearch_results.jsonl - all trial results (JSON) # outerloop-results/clean_sweep_results.jsonl - base sweep data # # TUNING: # --trials N : number of autoresearch trials (default 100) # --explore K : UCB kappa, higher = more exploration (default 2.0) # # HOW IT WORKS (Karpathy-style autoresearch): # - A Gaussian Process (GP) is fit on all existing (params, reward) pairs # - The GP models the unknown reward function over the parameter space # - UCB acquisition = GP mean + kappa * GP uncertainty # - The next trial uses the params that maximize UCB # - This intelligently balances exploiting known good regions vs # exploring uncertain regions - far smarter than any fixed grid # # PARAMETER SPACE EXPLORED (continuously, not just grid values): # n_steer: 3 to 9 (integer) # n_throttle: 2 to 5 (integer) # learning_rate: 0.00005 to 0.005 (float)