6.9 KiB

Raw Blame History

Architecture Decision Records — DonkeyCar RL Autoresearch

One ADR per major non-obvious technical choice. Agents read this to avoid re-opening settled decisions.

ADR-001: PPO over DQN as Primary Agent

Date: 2026-04-13
Status: Accepted

Context: DonkeyCar driving is a continuous control problem (steer ∈ [-1,1], throttle ∈ [0,1]). DQN requires discrete action spaces; we worked around this with DiscretizedActionWrapper. PPO supports continuous action spaces natively.

Decision: Use PPO as the primary agent. Keep DQN support for discrete action experiments.

Consequences:

PPO trains faster on continuous driving tasks (no discretization artifacts)
No need for DiscretizedActionWrapper with PPO (but keep it for DQN experiments)
PPO with CnnPolicy handles raw image observations natively

Rejected alternatives:

DQN only — requires discretization; loses steering resolution
SAC — valid alternative but PPO is simpler and well-tested on DonkeyCar

ADR-002: Pure Numpy GP (TinyGP) over sklearn

Date: 2026-04-13
Status: Accepted

Context: We need a Gaussian Process surrogate model for the autoresearch controller. sklearn.gaussian_process exists but has had compatibility issues with our numpy version.

Decision: Use TinyGP — a pure numpy RBF kernel GP implemented in autoresearch_controller.py.

Consequences:

No sklearn dependency
Full control over kernel and noise parameters
Slightly less optimized than sklearn but sufficient for < 1000 data points

Rejected alternatives:

sklearn GaussianProcessRegressor — dependency issues
GPyTorch — overkill, adds PyTorch dependency
Botorch — same

ADR-003: JSONL Append-Only Results

Date: 2026-04-13
Status: Accepted

Context: Results from 300+ trials must be persistent, recoverable, and never lost.

Decision: All results are appended to JSONL files. Results files are never truncated or overwritten.

Consequences:

System can be interrupted and resumed at any point
Historical data is preserved even if a later trial fails
Easy to parse with json.loads(line) per line

Rejected alternatives:

SQLite — adds dependency, overkill for this volume
CSV — loses type information, harder to extend

ADR-004: GP+UCB Bayesian Optimization for Hyperparameter Search

Date: 2026-04-13
Status: Accepted

Context: We need an intelligent hyperparameter search strategy. Grid search was the starting point but misses non-grid-aligned optimal regions (proven: n_steer=8 was NOT in the original grid of [3,5,7]).

Decision: Gaussian Process + Upper Confidence Bound (UCB) acquisition. GP models the reward landscape; UCB balances exploration vs exploitation.

kappa=2.0 default: reasonable balance, can be increased for more exploration.

Consequences:

Finds optimal regions with fewer trials than grid search
Naturally handles continuous parameter spaces (learning_rate ∈ [0.00005, 0.005])
Requires at least 2 data points before GP can be fit (random sampling for first 2 trials)

Rejected alternatives:

Random search — better than grid but no learning
Tree Parzen Estimator (TPE/Optuna) — valid alternative, adds dependency
CMA-ES — better for high-dimensional spaces; our space is 3D, GP is sufficient
Population-Based Training (PBT) — requires parallel sim instances (we only have 1)

ADR-005: No Model Saving Before Model is Defined

Date: 2026-04-13
Status: Accepted (bug fix — never repeat)

Context: The original donkeycar_sb3_runner.py called model.save(save_path) after removing the model training code. This caused NameError: name 'model' is not defined on every single run for 300 trials.

Decision: Never call model.save() without first verifying model is defined. Training and saving must be atomic — if training fails, no save attempt.

Pattern:

try:
    model = PPO('CnnPolicy', env, ...)
    model.learn(total_timesteps=timesteps)
    model.save(save_path)
except Exception as e:
    log(f'Training failed: {e}')
    sys.exit(102)

Rejected alternatives:

Checking if 'model' in locals() before save — fragile, hides bugs

ADR-006: env.close() + 2-Second Cooldown is Non-Negotiable

Date: 2026-04-13
Status: Accepted

Context: Early in the project, not calling env.close() between runs caused simulator zombie processes that locked up the entire system. 20+ consecutive runs work reliably with this pattern.

Decision: Every runner process MUST:

Call env.close() in a try/except before exit
Sleep 2 seconds after close
Then exit

This applies even if training or evaluation fails.

Rejected alternatives:

Relying on Python garbage collection for env cleanup — proven to cause hangs

ADR-007: PPO with CnnPolicy for Image Observations

Date: 2026-04-13
Status: Accepted

Context: DonkeyCar provides 120x160x3 RGB camera images as observations. The policy must process images.

Decision: Use PPO('CnnPolicy', env, ...) from SB3. CnnPolicy automatically handles image preprocessing with a CNN feature extractor.

Consequences:

Larger model than MlpPolicy (image processing overhead)
Requires VecTransposeImage wrapper (SB3 handles this internally)
Training is slower per step but produces better driving behavior

Rejected alternatives:

MlpPolicy — cannot handle raw image inputs
Custom CNN — unnecessary complexity given SB3's built-in CnnPolicy

ADR-008: All Phases Planned, Phase 1 Executed First

Date: 2026-04-13
Status: Accepted

Context: User asked whether to implement Phase 1 only or all phases. Three phases identified:

Real Training Foundation
Multi-Track Generalization
Racing / Speed Optimization

Decision: Plan all phases in full documentation, execute Phase 1 first. Do not start Phase 2 until Phase 1 produces a genuine champion model (mean_reward > 100 on training track). This creates a wave gate between Phase 1 and Phase 2.

Rationale: Phase 2 and 3 depend on having a real trained model. Without Phase 1 complete, there is nothing to generalize or optimize for speed.

ADR-009: Tests Must Not Require Live Simulator

Date: 2026-04-13
Status: Accepted

Context: The DonkeyCar simulator must be running on port 9091 for live training. Tests cannot depend on this.

Decision: All pytest tests mock the gym environment. Integration tests use a MagicMock gym env that returns fake observations, rewards, and done signals. Only manual/acceptance tests require the live simulator.

Pattern:

@patch('gymnasium.make')
def test_runner_exits_cleanly(mock_make):
    mock_env = MagicMock()
    mock_env.reset.return_value = (np.zeros((120,160,3)), {})
    mock_env.step.return_value = (np.zeros((120,160,3)), 1.0, True, False, {})
    mock_env.action_space = gym.spaces.Box(...)
    mock_make.return_value = mock_env
    # ... test runner

6.9 KiB Raw Blame History

Architecture Decision Records — DonkeyCar RL Autoresearch

ADR-001: PPO over DQN as Primary Agent

ADR-002: Pure Numpy GP (TinyGP) over sklearn

ADR-003: JSONL Append-Only Results

ADR-004: GP+UCB Bayesian Optimization for Hyperparameter Search

ADR-005: No Model Saving Before Model is Defined

ADR-006: env.close() + 2-Second Cooldown is Non-Negotiable

ADR-007: PPO with CnnPolicy for Image Observations

ADR-008: All Phases Planned, Phase 1 Executed First

ADR-009: Tests Must Not Require Live Simulator

6.9 KiB

Raw Blame History