donkeycar-rl-autoresearch/agent/AUTORESEARCH_README.txt

41 lines
1.7 KiB
Plaintext

# DonkeyCar RL Autoresearch - README
# ===================================
#
# QUICK START (after simulator is running):
#
# cd /home/paulh/projects/donkeycar-rl-autoresearch/agent
# python3 autoresearch_controller.py --trials 100
#
# The autoresearch will:
# 1. Load all base sweep data (clean_sweep_results.jsonl)
# 2. Fit a Gaussian Process surrogate model on reward-vs-params
# 3. Use UCB (Upper Confidence Bound) to propose next best params
# 4. Launch RL jobs automatically via the robust runner
# 5. Record all results to outerloop-results/autoresearch_results.jsonl
# 6. Repeat for --trials iterations, learning as it goes
#
# You can stop at any time with Ctrl+C.
# Restart and it automatically picks up all prior results.
#
# LOGS:
# outerloop-results/autoresearch_log.txt - human-readable log
# outerloop-results/autoresearch_results.jsonl - all trial results (JSON)
# outerloop-results/clean_sweep_results.jsonl - base sweep data
#
# TUNING:
# --trials N : number of autoresearch trials (default 100)
# --explore K : UCB kappa, higher = more exploration (default 2.0)
#
# HOW IT WORKS (Karpathy-style autoresearch):
# - A Gaussian Process (GP) is fit on all existing (params, reward) pairs
# - The GP models the unknown reward function over the parameter space
# - UCB acquisition = GP mean + kappa * GP uncertainty
# - The next trial uses the params that maximize UCB
# - This intelligently balances exploiting known good regions vs
# exploring uncertain regions - far smarter than any fixed grid
#
# PARAMETER SPACE EXPLORED (continuously, not just grid values):
# n_steer: 3 to 9 (integer)
# n_throttle: 2 to 5 (integer)
# learning_rate: 0.00005 to 0.005 (float)