12 KiB
System Architecture — DonkeyCar RL Autoresearch
Overview
Five distinct layers talk to each other. From bottom to top:
┌─────────────────────────────────────────────────────────────────┐
│ Layer 5: OUR CODE (autoresearch_controller, wave4_controller) │
│ GP+UCB proposes hyperparameters, launches training │
├─────────────────────────────────────────────────────────────────┤
│ Layer 4: OUR CODE (multitrack_runner, reward_wrapper) │
│ PPO training loop, reward shaping, track switching │
├─────────────────────────────────────────────────────────────────┤
│ Layer 3: gym_donkeycar (Python package, installed) │
│ Gymnasium environment wrapper around the sim │
├─────────────────────────────────────────────────────────────────┤
│ Layer 2: TCP socket (localhost:9091) │
│ JSON messages in both directions │
├─────────────────────────────────────────────────────────────────┤
│ Layer 1: sdsandbox (Unity app, running on Windows/WSL) │
│ 3D physics simulation, rendering, track logic │
└─────────────────────────────────────────────────────────────────┘
Layer 1: sdsandbox (Unity Simulator)
Location: /mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/
Language: C# (Unity)
What it does: Runs the 3D physics simulation — car physics, track geometry,
collision detection, camera rendering, lap timing.
Key C# scripts
| File | Role |
|---|---|
Scripts/tcp/TcpCarHandler.cs |
Main bridge — handles the TCP connection, reads steering/throttle commands, sends telemetry JSON every frame |
Scripts/CarPath.cs |
Defines the track centreline as a series of nodes; computes CTE via GetCrossTrackErr() |
Scripts/PathManager.cs |
Manages the active path, knows which node the car is near (iActiveSpan) |
Scripts/startingLine.cs |
Detects lap completions, measures lap times |
Scripts/Car.cs |
Car physics — applies steering/throttle, tracks velocity, collision |
Scripts/SceneLoader.cs |
Loads/unloads track scenes in response to load_scene / exit_scene messages |
Scripts/GlobalState.cs |
Flags like extendedTelemetry that gate which fields are sent |
What the sim sends every frame (telemetry JSON)
{
"msg_type": "telemetry",
"steering_angle": 0.0,
"throttle": 0.4,
"image": "<base64 camera image>",
"hit": "none",
"time": 12.34,
"speed": 2.5,
"accel_x/y/z": ...,
"gyro_x/y/z": ...,
"pitch/yaw/roll": ...,
"activeNode": 42, ← current path node index (ALWAYS sent)
"totalNodes": 186, ← total path nodes (ALWAYS sent)
"cte": 0.3, ← cross-track error (extendedTelemetry=true)
"pos_x/y/z": ..., ← world position (extendedTelemetry=true)
"vel_x/y/z": ... ← world velocity (extendedTelemetry=true)
}
What the sim receives (commands)
{ "msg_type": "control", "steering": 0.2, "throttle": 0.5, "brake": 0.0 }
{ "msg_type": "load_scene", "scene_name": "generated_track" }
{ "msg_type": "exit_scene" }
{ "msg_type": "car_config", ... }
Layer 2: TCP Socket (localhost:9091)
A plain TCP connection carrying newline-delimited JSON messages. The sim is the server (listens on 9091). Python is the client (connects to 9091).
Critical rule: Each gym.make() call opens ONE TCP connection, which
spawns ONE car in the sim. Opening a second connection spawns a phantom
second car. Always env.close() before opening a new connection.
Track switching must go through the EXISTING connection via exit_scene,
not by opening a new connection.
Layer 3: gym_donkeycar (Python Package)
Location: /home/paulh/.local/lib/python3.10/site-packages/gym_donkeycar/
Installed via: pip
What it does: Wraps the TCP connection as a standard Gymnasium environment
so Stable-Baselines3 and other RL libraries can use it.
File structure
gym_donkeycar/
├── __init__.py Registers all environments with gymnasium
├── core/
│ ├── sim_client.py SDClient — raw TCP socket send/receive
│ ├── client.py Low-level socket, threading, message queue
│ └── message.py IMesgHandler interface
└── envs/
├── donkey_env.py DonkeyEnv — THE gymnasium.Env subclass
├── donkey_sim.py DonkeyUnitySimContoller — parses telemetry,
│ builds info dict, manages episode state
└── donkey_proc.py Optional: launches sim as subprocess
How they connect
DonkeyEnv (donkey_env.py)
└── creates DonkeyUnitySimContoller (donkey_sim.py)
└── creates SimClient (core/sim_client.py)
└── creates SDClient (core/client.py)
└── TCP socket → Unity sim
donkey_env.py — the Gymnasium interface
This is what your code calls with gym.make('donkey-generated-track-v0').
reset()→ sendscar_config, waits forsim started!, returns first obsstep(action)→ sendscontrolmessage (steering + throttle), waits for next telemetry frame, returns(obs, reward, terminated, truncated, info)- Observation = camera image (120×160×3 uint8)
- Action space = Box([-1,0], [1,1]) — [steering, throttle]
donkey_sim.py — the telemetry parser
Receives JSON frames from the sim and maintains state:
| Attribute | Source | Meaning |
|---|---|---|
self.image_array |
image field |
Current camera frame |
self.cte |
cte field |
Cross-track error (metres from centreline) |
self.speed |
speed field |
Car speed (m/s) |
self.hit |
hit field |
What was last hit ("none" or object name) |
self.x/y/z |
pos_x/y/z |
World position |
self.lap_count |
crossing start line | Completed laps |
self.last_lap_time |
crossing start line | Most recent lap time (seconds) |
self.active_node |
activeNode |
Current path node index ← newly added |
self.total_nodes |
totalNodes |
Total path nodes ← newly added |
The info dict returned from step() contains all of the above plus:
track_progress = active_node / total_nodes← newly added, 0.0→1.0
Episode termination (done=True) fires when:
abs(cte) > max_cte(default 8m) — car too far off centrelinehit != "none"— car hit something (when detected by physics)
Registered environments
# All defined in gym_donkeycar/__init__.py
'donkey-generated-roads-v0' → GeneratedRoadsEnv (generated_road)
'donkey-generated-track-v0' → GeneratedTrackEnv (generated_track)
'donkey-mountain-track-v0' → MountainTrackEnv (mountain_track)
'donkey-minimonaco-track-v0' → MiniMonacoEnv (mini_monaco)
'donkey-warehouse-v0' → WarehouseEnv
'donkey-roboracingleague-track-v0' → RoboRacingLeagueTrackEnv
# ... etc
Layer 4: Our Training Code
Location: agent/
reward_wrapper.py — SpeedRewardWrapper
Wraps a DonkeyEnv and completely replaces the sim's own reward signal.
v5 reward (current):
reward = (speed / 10.0) × (1 - |cte| / max_cte)
- Fast + centred = high reward
- Slow (e.g. on a hill) = low reward → gradient pushes toward more throttle
- Off-track = near-zero reward
- Crash (done=True) = -1.0
- Short-lap exploit (<5s): large penalty
multitrack_runner.py — Training Loop
Manages round-robin training across multiple tracks:
- Creates env on track A, trains for
steps_per_switchsteps - Calls
close_and_switch()→ sendsexit_scenevia existing viewer, closes env, waits, opens env on track B - Repeats until
total_timestepsreached - Evaluates on test tracks (mini_monaco, etc.)
Wrapper stack applied to every env:
gym.make(track_id) ← raw DonkeyEnv
→ ThrottleClampWrapper ← ensures minimum throttle (0.2 or 0.5)
→ StuckTerminationWrapper ← ends episode if <0.5m in 80 steps
→ SpeedRewardWrapper ← replaces reward with v5 formula
→ DummyVecEnv ← SB3 requires vectorised envs
→ VecTransposeImage ← SB3 CNN needs (C,H,W) not (H,W,C)
Key design decisions
- PPO with CnnPolicy — raw image input, SB3 handles CNN feature extraction
- Continuous actions — steering [-1,1] and throttle [0,1]; no discretisation
- No warm-start — each trial trains from random weights to avoid bias
- Per-segment checkpointing — model saved after every training segment so timeouts don't lose all progress
Layer 5: Autoresearch (GP+UCB)
wave4_controller.py — outer loop:
- Proposes hyperparameters (learning_rate, steps_per_switch, total_timesteps) using Gaussian Process + Upper Confidence Bound (GP+UCB)
- Launches
multitrack_runner.pyas a subprocess - Parses test track scores from stdout
- Updates GP with (hyperparams → score) to improve next proposal
- Saves champion model when score improves
TinyGP — pure numpy Gaussian Process (no sklearn dependency):
- Fits a smooth surface over (hyperparams → performance) space
- UCB = mean + κ×std — balances exploiting known-good regions vs exploring uncertain ones
Data Flow: One Training Step
1. model.predict(obs) → action [steering, throttle]
2. ThrottleClampWrapper.step(action) → clamp throttle ≥ 0.2
3. StuckTerminationWrapper.step(action) → check if car moved <0.5m in 80 steps
4. SpeedRewardWrapper.step(action) → compute v5 reward, check short-lap exploit
5. DonkeyEnv.step(action) → send TCP "control" message to Unity sim
6. Unity sim → physics tick → send telemetry JSON back
7. donkey_sim.py → parse JSON → update cte, speed, active_node, track_progress
8. DonkeyEnv.step() returns (obs=camera_image, reward=sim_reward, done, info)
9. SpeedRewardWrapper replaces sim_reward with v5 reward
10. SB3 PPO stores (obs, action, v5_reward, done) in rollout buffer
11. After n_steps=2048: PPO gradient update → policy weights update
What track_progress Tells Us (New)
info['track_progress'] = activeNode / totalNodes
- 0.0 = car is at the start line
- 0.5 = car is halfway around the track
- 1.0 = car has completed the track
This is the first time we have forward progress information in the reward. Previously, CTE only told us "how far sideways from the centreline" — not "how far along the track." With track_progress we can reward the model for getting further around the track even if it's slow or slightly off-centre. This is especially important for mountain_track where the hill blocked learning.
File Quick Reference
| File | What to edit when... |
|---|---|
agent/reward_wrapper.py |
Changing reward function |
agent/multitrack_runner.py |
Changing training loop, wrappers, track switching |
agent/wave4_controller.py |
Changing GP search, hyperparameter ranges |
gym_donkeycar/envs/donkey_sim.py |
Adding new fields from sim telemetry |
gym_donkeycar/envs/donkey_env.py |
Changing env reset/step behaviour |
sdsandbox/.../TcpCarHandler.cs |
Adding new telemetry fields from Unity |
sdsandbox/.../CarPath.cs |
Changing how CTE / track progress is computed |