donkeycar-rl-autoresearch/agent/AUTORESEARCH_README.txt

# DonkeyCar RL Autoresearch - README
# ===================================
#
# QUICK START (after simulator is running):
#
#   cd /home/paulh/projects/donkeycar-rl-autoresearch/agent
#   python3 autoresearch_controller.py --trials 100
#
# The autoresearch will:
#   1. Load all base sweep data (clean_sweep_results.jsonl)
#   2. Fit a Gaussian Process surrogate model on reward-vs-params
#   3. Use UCB (Upper Confidence Bound) to propose next best params
#   4. Launch RL jobs automatically via the robust runner
#   5. Record all results to outerloop-results/autoresearch_results.jsonl
#   6. Repeat for --trials iterations, learning as it goes
#
# You can stop at any time with Ctrl+C.
# Restart and it automatically picks up all prior results.
#
# LOGS:
#   outerloop-results/autoresearch_log.txt     - human-readable log
#   outerloop-results/autoresearch_results.jsonl - all trial results (JSON)
#   outerloop-results/clean_sweep_results.jsonl  - base sweep data
#
# TUNING:
#   --trials N    : number of autoresearch trials (default 100)
#   --explore K   : UCB kappa, higher = more exploration (default 2.0)
#
# HOW IT WORKS (Karpathy-style autoresearch):
#   - A Gaussian Process (GP) is fit on all existing (params, reward) pairs
#   - The GP models the unknown reward function over the parameter space
#   - UCB acquisition = GP mean + kappa * GP uncertainty
#   - The next trial uses the params that maximize UCB
#   - This intelligently balances exploiting known good regions vs
#     exploring uncertain regions - far smarter than any fixed grid
#
# PARAMETER SPACE EXPLORED (continuously, not just grid values):
#   n_steer:       3 to 9  (integer)
#   n_throttle:    2 to 5  (integer)
#   learning_rate: 0.00005 to 0.005 (float)