41 lines
1.7 KiB
Plaintext
41 lines
1.7 KiB
Plaintext
# DonkeyCar RL Autoresearch - README
|
|
# ===================================
|
|
#
|
|
# QUICK START (after simulator is running):
|
|
#
|
|
# cd /home/paulh/projects/donkeycar-rl-autoresearch/agent
|
|
# python3 autoresearch_controller.py --trials 100
|
|
#
|
|
# The autoresearch will:
|
|
# 1. Load all base sweep data (clean_sweep_results.jsonl)
|
|
# 2. Fit a Gaussian Process surrogate model on reward-vs-params
|
|
# 3. Use UCB (Upper Confidence Bound) to propose next best params
|
|
# 4. Launch RL jobs automatically via the robust runner
|
|
# 5. Record all results to outerloop-results/autoresearch_results.jsonl
|
|
# 6. Repeat for --trials iterations, learning as it goes
|
|
#
|
|
# You can stop at any time with Ctrl+C.
|
|
# Restart and it automatically picks up all prior results.
|
|
#
|
|
# LOGS:
|
|
# outerloop-results/autoresearch_log.txt - human-readable log
|
|
# outerloop-results/autoresearch_results.jsonl - all trial results (JSON)
|
|
# outerloop-results/clean_sweep_results.jsonl - base sweep data
|
|
#
|
|
# TUNING:
|
|
# --trials N : number of autoresearch trials (default 100)
|
|
# --explore K : UCB kappa, higher = more exploration (default 2.0)
|
|
#
|
|
# HOW IT WORKS (Karpathy-style autoresearch):
|
|
# - A Gaussian Process (GP) is fit on all existing (params, reward) pairs
|
|
# - The GP models the unknown reward function over the parameter space
|
|
# - UCB acquisition = GP mean + kappa * GP uncertainty
|
|
# - The next trial uses the params that maximize UCB
|
|
# - This intelligently balances exploiting known good regions vs
|
|
# exploring uncertain regions - far smarter than any fixed grid
|
|
#
|
|
# PARAMETER SPACE EXPLORED (continuously, not just grid values):
|
|
# n_steer: 3 to 9 (integer)
|
|
# n_throttle: 2 to 5 (integer)
|
|
# learning_rate: 0.00005 to 0.005 (float)
|