6.9 KiB
Architecture Decision Records — DonkeyCar RL Autoresearch
One ADR per major non-obvious technical choice. Agents read this to avoid re-opening settled decisions.
ADR-001: PPO over DQN as Primary Agent
Date: 2026-04-13
Status: Accepted
Context: DonkeyCar driving is a continuous control problem (steer ∈ [-1,1], throttle ∈ [0,1]). DQN requires discrete action spaces; we worked around this with DiscretizedActionWrapper. PPO supports continuous action spaces natively.
Decision: Use PPO as the primary agent. Keep DQN support for discrete action experiments.
Consequences:
- PPO trains faster on continuous driving tasks (no discretization artifacts)
- No need for DiscretizedActionWrapper with PPO (but keep it for DQN experiments)
- PPO with CnnPolicy handles raw image observations natively
Rejected alternatives:
- DQN only — requires discretization; loses steering resolution
- SAC — valid alternative but PPO is simpler and well-tested on DonkeyCar
ADR-002: Pure Numpy GP (TinyGP) over sklearn
Date: 2026-04-13
Status: Accepted
Context: We need a Gaussian Process surrogate model for the autoresearch controller. sklearn.gaussian_process exists but has had compatibility issues with our numpy version.
Decision: Use TinyGP — a pure numpy RBF kernel GP implemented in autoresearch_controller.py.
Consequences:
- No sklearn dependency
- Full control over kernel and noise parameters
- Slightly less optimized than sklearn but sufficient for < 1000 data points
Rejected alternatives:
- sklearn GaussianProcessRegressor — dependency issues
- GPyTorch — overkill, adds PyTorch dependency
- Botorch — same
ADR-003: JSONL Append-Only Results
Date: 2026-04-13
Status: Accepted
Context: Results from 300+ trials must be persistent, recoverable, and never lost.
Decision: All results are appended to JSONL files. Results files are never truncated or overwritten.
Consequences:
- System can be interrupted and resumed at any point
- Historical data is preserved even if a later trial fails
- Easy to parse with
json.loads(line)per line
Rejected alternatives:
- SQLite — adds dependency, overkill for this volume
- CSV — loses type information, harder to extend
ADR-004: GP+UCB Bayesian Optimization for Hyperparameter Search
Date: 2026-04-13
Status: Accepted
Context: We need an intelligent hyperparameter search strategy. Grid search was the starting point but misses non-grid-aligned optimal regions (proven: n_steer=8 was NOT in the original grid of [3,5,7]).
Decision: Gaussian Process + Upper Confidence Bound (UCB) acquisition. GP models the reward landscape; UCB balances exploration vs exploitation.
kappa=2.0 default: reasonable balance, can be increased for more exploration.
Consequences:
- Finds optimal regions with fewer trials than grid search
- Naturally handles continuous parameter spaces (learning_rate ∈ [0.00005, 0.005])
- Requires at least 2 data points before GP can be fit (random sampling for first 2 trials)
Rejected alternatives:
- Random search — better than grid but no learning
- Tree Parzen Estimator (TPE/Optuna) — valid alternative, adds dependency
- CMA-ES — better for high-dimensional spaces; our space is 3D, GP is sufficient
- Population-Based Training (PBT) — requires parallel sim instances (we only have 1)
ADR-005: No Model Saving Before Model is Defined
Date: 2026-04-13
Status: Accepted (bug fix — never repeat)
Context: The original donkeycar_sb3_runner.py called model.save(save_path) after removing the model training code. This caused NameError: name 'model' is not defined on every single run for 300 trials.
Decision: Never call model.save() without first verifying model is defined. Training and saving must be atomic — if training fails, no save attempt.
Pattern:
try:
model = PPO('CnnPolicy', env, ...)
model.learn(total_timesteps=timesteps)
model.save(save_path)
except Exception as e:
log(f'Training failed: {e}')
sys.exit(102)
Rejected alternatives:
- Checking
if 'model' in locals()before save — fragile, hides bugs
ADR-006: env.close() + 2-Second Cooldown is Non-Negotiable
Date: 2026-04-13
Status: Accepted
Context: Early in the project, not calling env.close() between runs caused simulator zombie processes that locked up the entire system. 20+ consecutive runs work reliably with this pattern.
Decision: Every runner process MUST:
- Call
env.close()in a try/except before exit - Sleep 2 seconds after close
- Then exit
This applies even if training or evaluation fails.
Rejected alternatives:
- Relying on Python garbage collection for env cleanup — proven to cause hangs
ADR-007: PPO with CnnPolicy for Image Observations
Date: 2026-04-13
Status: Accepted
Context: DonkeyCar provides 120x160x3 RGB camera images as observations. The policy must process images.
Decision: Use PPO('CnnPolicy', env, ...) from SB3. CnnPolicy automatically handles image preprocessing with a CNN feature extractor.
Consequences:
- Larger model than MlpPolicy (image processing overhead)
- Requires VecTransposeImage wrapper (SB3 handles this internally)
- Training is slower per step but produces better driving behavior
Rejected alternatives:
- MlpPolicy — cannot handle raw image inputs
- Custom CNN — unnecessary complexity given SB3's built-in CnnPolicy
ADR-008: All Phases Planned, Phase 1 Executed First
Date: 2026-04-13
Status: Accepted
Context: User asked whether to implement Phase 1 only or all phases. Three phases identified:
- Real Training Foundation
- Multi-Track Generalization
- Racing / Speed Optimization
Decision: Plan all phases in full documentation, execute Phase 1 first. Do not start Phase 2 until Phase 1 produces a genuine champion model (mean_reward > 100 on training track). This creates a wave gate between Phase 1 and Phase 2.
Rationale: Phase 2 and 3 depend on having a real trained model. Without Phase 1 complete, there is nothing to generalize or optimize for speed.
ADR-009: Tests Must Not Require Live Simulator
Date: 2026-04-13
Status: Accepted
Context: The DonkeyCar simulator must be running on port 9091 for live training. Tests cannot depend on this.
Decision: All pytest tests mock the gym environment. Integration tests use a MagicMock gym env that returns fake observations, rewards, and done signals. Only manual/acceptance tests require the live simulator.
Pattern:
@patch('gymnasium.make')
def test_runner_exits_cleanly(mock_make):
mock_env = MagicMock()
mock_env.reset.return_value = (np.zeros((120,160,3)), {})
mock_env.step.return_value = (np.zeros((120,160,3)), 1.0, True, False, {})
mock_env.action_space = gym.spaces.Box(...)
mock_make.return_value = mock_env
# ... test runner