# Architecture Decision Records — DonkeyCar RL Autoresearch > One ADR per major non-obvious technical choice. > Agents read this to avoid re-opening settled decisions. --- ## ADR-001: PPO over DQN as Primary Agent **Date:** 2026-04-13 **Status:** Accepted **Context:** DonkeyCar driving is a continuous control problem (steer ∈ [-1,1], throttle ∈ [0,1]). DQN requires discrete action spaces; we worked around this with DiscretizedActionWrapper. PPO supports continuous action spaces natively. **Decision:** Use PPO as the primary agent. Keep DQN support for discrete action experiments. **Consequences:** - PPO trains faster on continuous driving tasks (no discretization artifacts) - No need for DiscretizedActionWrapper with PPO (but keep it for DQN experiments) - PPO with CnnPolicy handles raw image observations natively **Rejected alternatives:** - DQN only — requires discretization; loses steering resolution - SAC — valid alternative but PPO is simpler and well-tested on DonkeyCar --- ## ADR-002: Pure Numpy GP (TinyGP) over sklearn **Date:** 2026-04-13 **Status:** Accepted **Context:** We need a Gaussian Process surrogate model for the autoresearch controller. sklearn.gaussian_process exists but has had compatibility issues with our numpy version. **Decision:** Use TinyGP — a pure numpy RBF kernel GP implemented in autoresearch_controller.py. **Consequences:** - No sklearn dependency - Full control over kernel and noise parameters - Slightly less optimized than sklearn but sufficient for < 1000 data points **Rejected alternatives:** - sklearn GaussianProcessRegressor — dependency issues - GPyTorch — overkill, adds PyTorch dependency - Botorch — same --- ## ADR-003: JSONL Append-Only Results **Date:** 2026-04-13 **Status:** Accepted **Context:** Results from 300+ trials must be persistent, recoverable, and never lost. **Decision:** All results are appended to JSONL files. Results files are never truncated or overwritten. **Consequences:** - System can be interrupted and resumed at any point - Historical data is preserved even if a later trial fails - Easy to parse with `json.loads(line)` per line **Rejected alternatives:** - SQLite — adds dependency, overkill for this volume - CSV — loses type information, harder to extend --- ## ADR-004: GP+UCB Bayesian Optimization for Hyperparameter Search **Date:** 2026-04-13 **Status:** Accepted **Context:** We need an intelligent hyperparameter search strategy. Grid search was the starting point but misses non-grid-aligned optimal regions (proven: n_steer=8 was NOT in the original grid of [3,5,7]). **Decision:** Gaussian Process + Upper Confidence Bound (UCB) acquisition. GP models the reward landscape; UCB balances exploration vs exploitation. **kappa=2.0** default: reasonable balance, can be increased for more exploration. **Consequences:** - Finds optimal regions with fewer trials than grid search - Naturally handles continuous parameter spaces (learning_rate ∈ [0.00005, 0.005]) - Requires at least 2 data points before GP can be fit (random sampling for first 2 trials) **Rejected alternatives:** - Random search — better than grid but no learning - Tree Parzen Estimator (TPE/Optuna) — valid alternative, adds dependency - CMA-ES — better for high-dimensional spaces; our space is 3D, GP is sufficient - Population-Based Training (PBT) — requires parallel sim instances (we only have 1) --- ## ADR-005: No Model Saving Before Model is Defined **Date:** 2026-04-13 **Status:** Accepted (bug fix — never repeat) **Context:** The original donkeycar_sb3_runner.py called `model.save(save_path)` after removing the model training code. This caused `NameError: name 'model' is not defined` on every single run for 300 trials. **Decision:** Never call `model.save()` without first verifying `model` is defined. Training and saving must be atomic — if training fails, no save attempt. **Pattern:** ```python try: model = PPO('CnnPolicy', env, ...) model.learn(total_timesteps=timesteps) model.save(save_path) except Exception as e: log(f'Training failed: {e}') sys.exit(102) ``` **Rejected alternatives:** - Checking `if 'model' in locals()` before save — fragile, hides bugs --- ## ADR-006: env.close() + 2-Second Cooldown is Non-Negotiable **Date:** 2026-04-13 **Status:** Accepted **Context:** Early in the project, not calling env.close() between runs caused simulator zombie processes that locked up the entire system. 20+ consecutive runs work reliably with this pattern. **Decision:** Every runner process MUST: 1. Call `env.close()` in a try/except before exit 2. Sleep 2 seconds after close 3. Then exit This applies even if training or evaluation fails. **Rejected alternatives:** - Relying on Python garbage collection for env cleanup — proven to cause hangs --- ## ADR-007: PPO with CnnPolicy for Image Observations **Date:** 2026-04-13 **Status:** Accepted **Context:** DonkeyCar provides 120x160x3 RGB camera images as observations. The policy must process images. **Decision:** Use `PPO('CnnPolicy', env, ...)` from SB3. CnnPolicy automatically handles image preprocessing with a CNN feature extractor. **Consequences:** - Larger model than MlpPolicy (image processing overhead) - Requires VecTransposeImage wrapper (SB3 handles this internally) - Training is slower per step but produces better driving behavior **Rejected alternatives:** - MlpPolicy — cannot handle raw image inputs - Custom CNN — unnecessary complexity given SB3's built-in CnnPolicy --- ## ADR-008: All Phases Planned, Phase 1 Executed First **Date:** 2026-04-13 **Status:** Accepted **Context:** User asked whether to implement Phase 1 only or all phases. Three phases identified: 1. Real Training Foundation 2. Multi-Track Generalization 3. Racing / Speed Optimization **Decision:** Plan all phases in full documentation, execute Phase 1 first. Do not start Phase 2 until Phase 1 produces a genuine champion model (mean_reward > 100 on training track). This creates a wave gate between Phase 1 and Phase 2. **Rationale:** Phase 2 and 3 depend on having a real trained model. Without Phase 1 complete, there is nothing to generalize or optimize for speed. --- ## ADR-009: Tests Must Not Require Live Simulator **Date:** 2026-04-13 **Status:** Accepted **Context:** The DonkeyCar simulator must be running on port 9091 for live training. Tests cannot depend on this. **Decision:** All pytest tests mock the gym environment. Integration tests use a MagicMock gym env that returns fake observations, rewards, and done signals. Only manual/acceptance tests require the live simulator. **Pattern:** ```python @patch('gymnasium.make') def test_runner_exits_cleanly(mock_make): mock_env = MagicMock() mock_env.reset.return_value = (np.zeros((120,160,3)), {}) mock_env.step.return_value = (np.zeros((120,160,3)), 1.0, True, False, {}) mock_env.action_space = gym.spaces.Box(...) mock_make.return_value = mock_env # ... test runner ```