donkeycar-rl-autoresearch/docs/ARCHITECTURE.md

12 KiB
Raw Permalink Blame History

System Architecture — DonkeyCar RL Autoresearch

Overview

Five distinct layers talk to each other. From bottom to top:

┌─────────────────────────────────────────────────────────────────┐
│  Layer 5: OUR CODE  (autoresearch_controller, wave4_controller) │
│           GP+UCB proposes hyperparameters, launches training     │
├─────────────────────────────────────────────────────────────────┤
│  Layer 4: OUR CODE  (multitrack_runner, reward_wrapper)         │
│           PPO training loop, reward shaping, track switching    │
├─────────────────────────────────────────────────────────────────┤
│  Layer 3: gym_donkeycar  (Python package, installed)            │
│           Gymnasium environment wrapper around the sim           │
├─────────────────────────────────────────────────────────────────┤
│  Layer 2: TCP socket  (localhost:9091)                          │
│           JSON messages in both directions                       │
├─────────────────────────────────────────────────────────────────┤
│  Layer 1: sdsandbox  (Unity app, running on Windows/WSL)        │
│           3D physics simulation, rendering, track logic          │
└─────────────────────────────────────────────────────────────────┘

Layer 1: sdsandbox (Unity Simulator)

Location: /mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/
Language: C# (Unity)
What it does: Runs the 3D physics simulation — car physics, track geometry, collision detection, camera rendering, lap timing.

Key C# scripts

File Role
Scripts/tcp/TcpCarHandler.cs Main bridge — handles the TCP connection, reads steering/throttle commands, sends telemetry JSON every frame
Scripts/CarPath.cs Defines the track centreline as a series of nodes; computes CTE via GetCrossTrackErr()
Scripts/PathManager.cs Manages the active path, knows which node the car is near (iActiveSpan)
Scripts/startingLine.cs Detects lap completions, measures lap times
Scripts/Car.cs Car physics — applies steering/throttle, tracks velocity, collision
Scripts/SceneLoader.cs Loads/unloads track scenes in response to load_scene / exit_scene messages
Scripts/GlobalState.cs Flags like extendedTelemetry that gate which fields are sent

What the sim sends every frame (telemetry JSON)

{
  "msg_type":      "telemetry",
  "steering_angle": 0.0,
  "throttle":       0.4,
  "image":          "<base64 camera image>",
  "hit":            "none",
  "time":           12.34,
  "speed":          2.5,
  "accel_x/y/z":   ...,
  "gyro_x/y/z":    ...,
  "pitch/yaw/roll": ...,
  "activeNode":     42,       current path node index (ALWAYS sent)
  "totalNodes":     186,      total path nodes    (ALWAYS sent)
  "cte":            0.3,      cross-track error   (extendedTelemetry=true)
  "pos_x/y/z":     ...,      world position       (extendedTelemetry=true)
  "vel_x/y/z":     ...       world velocity        (extendedTelemetry=true)
}

What the sim receives (commands)

{ "msg_type": "control", "steering": 0.2, "throttle": 0.5, "brake": 0.0 }
{ "msg_type": "load_scene", "scene_name": "generated_track" }
{ "msg_type": "exit_scene" }
{ "msg_type": "car_config", ... }

Layer 2: TCP Socket (localhost:9091)

A plain TCP connection carrying newline-delimited JSON messages. The sim is the server (listens on 9091). Python is the client (connects to 9091).

Critical rule: Each gym.make() call opens ONE TCP connection, which spawns ONE car in the sim. Opening a second connection spawns a phantom second car. Always env.close() before opening a new connection. Track switching must go through the EXISTING connection via exit_scene, not by opening a new connection.


Layer 3: gym_donkeycar (Python Package)

Location: /home/paulh/.local/lib/python3.10/site-packages/gym_donkeycar/
Installed via: pip
What it does: Wraps the TCP connection as a standard Gymnasium environment so Stable-Baselines3 and other RL libraries can use it.

File structure

gym_donkeycar/
├── __init__.py              Registers all environments with gymnasium
├── core/
│   ├── sim_client.py        SDClient — raw TCP socket send/receive
│   ├── client.py            Low-level socket, threading, message queue
│   └── message.py           IMesgHandler interface
└── envs/
    ├── donkey_env.py        DonkeyEnv — THE gymnasium.Env subclass
    ├── donkey_sim.py        DonkeyUnitySimContoller — parses telemetry,
    │                         builds info dict, manages episode state
    └── donkey_proc.py       Optional: launches sim as subprocess

How they connect

DonkeyEnv (donkey_env.py)
    └── creates DonkeyUnitySimContoller (donkey_sim.py)
            └── creates SimClient (core/sim_client.py)
                    └── creates SDClient (core/client.py)
                            └── TCP socket → Unity sim

donkey_env.py — the Gymnasium interface

This is what your code calls with gym.make('donkey-generated-track-v0').

  • reset() → sends car_config, waits for sim started!, returns first obs
  • step(action) → sends control message (steering + throttle), waits for next telemetry frame, returns (obs, reward, terminated, truncated, info)
  • Observation = camera image (120×160×3 uint8)
  • Action space = Box([-1,0], [1,1]) — [steering, throttle]

donkey_sim.py — the telemetry parser

Receives JSON frames from the sim and maintains state:

Attribute Source Meaning
self.image_array image field Current camera frame
self.cte cte field Cross-track error (metres from centreline)
self.speed speed field Car speed (m/s)
self.hit hit field What was last hit ("none" or object name)
self.x/y/z pos_x/y/z World position
self.lap_count crossing start line Completed laps
self.last_lap_time crossing start line Most recent lap time (seconds)
self.active_node activeNode Current path node index ← newly added
self.total_nodes totalNodes Total path nodes ← newly added

The info dict returned from step() contains all of the above plus:

  • track_progress = active_node / total_nodesnewly added, 0.0→1.0

Episode termination (done=True) fires when:

  • abs(cte) > max_cte (default 8m) — car too far off centreline
  • hit != "none" — car hit something (when detected by physics)

Registered environments

# All defined in gym_donkeycar/__init__.py
'donkey-generated-roads-v0'       GeneratedRoadsEnv    (generated_road)
'donkey-generated-track-v0'       GeneratedTrackEnv    (generated_track)
'donkey-mountain-track-v0'        MountainTrackEnv     (mountain_track)
'donkey-minimonaco-track-v0'      MiniMonacoEnv        (mini_monaco)
'donkey-warehouse-v0'             WarehouseEnv
'donkey-roboracingleague-track-v0'  RoboRacingLeagueTrackEnv
# ... etc

Layer 4: Our Training Code

Location: agent/

reward_wrapper.py — SpeedRewardWrapper

Wraps a DonkeyEnv and completely replaces the sim's own reward signal.

v5 reward (current):

reward = (speed / 10.0) × (1 - |cte| / max_cte)
  • Fast + centred = high reward
  • Slow (e.g. on a hill) = low reward → gradient pushes toward more throttle
  • Off-track = near-zero reward
  • Crash (done=True) = -1.0
  • Short-lap exploit (<5s): large penalty

multitrack_runner.py — Training Loop

Manages round-robin training across multiple tracks:

  1. Creates env on track A, trains for steps_per_switch steps
  2. Calls close_and_switch() → sends exit_scene via existing viewer, closes env, waits, opens env on track B
  3. Repeats until total_timesteps reached
  4. Evaluates on test tracks (mini_monaco, etc.)

Wrapper stack applied to every env:

gym.make(track_id)                    ← raw DonkeyEnv
  → ThrottleClampWrapper              ← ensures minimum throttle (0.2 or 0.5)
    → StuckTerminationWrapper         ← ends episode if <0.5m in 80 steps
      → SpeedRewardWrapper            ← replaces reward with v5 formula
        → DummyVecEnv                 ← SB3 requires vectorised envs
          → VecTransposeImage         ← SB3 CNN needs (C,H,W) not (H,W,C)

Key design decisions

  • PPO with CnnPolicy — raw image input, SB3 handles CNN feature extraction
  • Continuous actions — steering [-1,1] and throttle [0,1]; no discretisation
  • No warm-start — each trial trains from random weights to avoid bias
  • Per-segment checkpointing — model saved after every training segment so timeouts don't lose all progress

Layer 5: Autoresearch (GP+UCB)

wave4_controller.py — outer loop:

  1. Proposes hyperparameters (learning_rate, steps_per_switch, total_timesteps) using Gaussian Process + Upper Confidence Bound (GP+UCB)
  2. Launches multitrack_runner.py as a subprocess
  3. Parses test track scores from stdout
  4. Updates GP with (hyperparams → score) to improve next proposal
  5. Saves champion model when score improves

TinyGP — pure numpy Gaussian Process (no sklearn dependency):

  • Fits a smooth surface over (hyperparams → performance) space
  • UCB = mean + κ×std — balances exploiting known-good regions vs exploring uncertain ones

Data Flow: One Training Step

1. model.predict(obs) → action [steering, throttle]
2. ThrottleClampWrapper.step(action) → clamp throttle ≥ 0.2
3. StuckTerminationWrapper.step(action) → check if car moved <0.5m in 80 steps
4. SpeedRewardWrapper.step(action) → compute v5 reward, check short-lap exploit
5. DonkeyEnv.step(action) → send TCP "control" message to Unity sim
6. Unity sim → physics tick → send telemetry JSON back
7. donkey_sim.py → parse JSON → update cte, speed, active_node, track_progress
8. DonkeyEnv.step() returns (obs=camera_image, reward=sim_reward, done, info)
9. SpeedRewardWrapper replaces sim_reward with v5 reward
10. SB3 PPO stores (obs, action, v5_reward, done) in rollout buffer
11. After n_steps=2048: PPO gradient update → policy weights update

What track_progress Tells Us (New)

info['track_progress'] = activeNode / totalNodes

  • 0.0 = car is at the start line
  • 0.5 = car is halfway around the track
  • 1.0 = car has completed the track

This is the first time we have forward progress information in the reward. Previously, CTE only told us "how far sideways from the centreline" — not "how far along the track." With track_progress we can reward the model for getting further around the track even if it's slow or slightly off-centre. This is especially important for mountain_track where the hill blocked learning.


File Quick Reference

File What to edit when...
agent/reward_wrapper.py Changing reward function
agent/multitrack_runner.py Changing training loop, wrappers, track switching
agent/wave4_controller.py Changing GP search, hyperparameter ranges
gym_donkeycar/envs/donkey_sim.py Adding new fields from sim telemetry
gym_donkeycar/envs/donkey_env.py Changing env reset/step behaviour
sdsandbox/.../TcpCarHandler.cs Adding new telemetry fields from Unity
sdsandbox/.../CarPath.cs Changing how CTE / track progress is computed