12 KiB

Raw Permalink Blame History

System Architecture — DonkeyCar RL Autoresearch

Overview

Five distinct layers talk to each other. From bottom to top:

┌─────────────────────────────────────────────────────────────────┐
│  Layer 5: OUR CODE  (autoresearch_controller, wave4_controller) │
│           GP+UCB proposes hyperparameters, launches training     │
├─────────────────────────────────────────────────────────────────┤
│  Layer 4: OUR CODE  (multitrack_runner, reward_wrapper)         │
│           PPO training loop, reward shaping, track switching    │
├─────────────────────────────────────────────────────────────────┤
│  Layer 3: gym_donkeycar  (Python package, installed)            │
│           Gymnasium environment wrapper around the sim           │
├─────────────────────────────────────────────────────────────────┤
│  Layer 2: TCP socket  (localhost:9091)                          │
│           JSON messages in both directions                       │
├─────────────────────────────────────────────────────────────────┤
│  Layer 1: sdsandbox  (Unity app, running on Windows/WSL)        │
│           3D physics simulation, rendering, track logic          │
└─────────────────────────────────────────────────────────────────┘

Layer 1: sdsandbox (Unity Simulator)

Location: /mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/
Language: C# (Unity)
What it does: Runs the 3D physics simulation — car physics, track geometry, collision detection, camera rendering, lap timing.

Key C# scripts

File	Role
`Scripts/tcp/TcpCarHandler.cs`	Main bridge — handles the TCP connection, reads steering/throttle commands, sends telemetry JSON every frame
`Scripts/CarPath.cs`	Defines the track centreline as a series of nodes; computes CTE via `GetCrossTrackErr()`
`Scripts/PathManager.cs`	Manages the active path, knows which node the car is near (`iActiveSpan`)
`Scripts/startingLine.cs`	Detects lap completions, measures lap times
`Scripts/Car.cs`	Car physics — applies steering/throttle, tracks velocity, collision
`Scripts/SceneLoader.cs`	Loads/unloads track scenes in response to `load_scene` / `exit_scene` messages
`Scripts/GlobalState.cs`	Flags like `extendedTelemetry` that gate which fields are sent

What the sim sends every frame (telemetry JSON)

{
  "msg_type":      "telemetry",
  "steering_angle": 0.0,
  "throttle":       0.4,
  "image":          "<base64 camera image>",
  "hit":            "none",
  "time":           12.34,
  "speed":          2.5,
  "accel_x/y/z":   ...,
  "gyro_x/y/z":    ...,
  "pitch/yaw/roll": ...,
  "activeNode":     42,      ← current path node index (ALWAYS sent)
  "totalNodes":     186,     ← total path nodes    (ALWAYS sent)
  "cte":            0.3,     ← cross-track error   (extendedTelemetry=true)
  "pos_x/y/z":     ...,     ← world position       (extendedTelemetry=true)
  "vel_x/y/z":     ...      ← world velocity        (extendedTelemetry=true)
}

What the sim receives (commands)

{ "msg_type": "control", "steering": 0.2, "throttle": 0.5, "brake": 0.0 }
{ "msg_type": "load_scene", "scene_name": "generated_track" }
{ "msg_type": "exit_scene" }
{ "msg_type": "car_config", ... }

Layer 2: TCP Socket (localhost:9091)

A plain TCP connection carrying newline-delimited JSON messages. The sim is the server (listens on 9091). Python is the client (connects to 9091).

Critical rule: Each gym.make() call opens ONE TCP connection, which spawns ONE car in the sim. Opening a second connection spawns a phantom second car. Always env.close() before opening a new connection. Track switching must go through the EXISTING connection via exit_scene, not by opening a new connection.

Layer 3: gym_donkeycar (Python Package)

Location: /home/paulh/.local/lib/python3.10/site-packages/gym_donkeycar/
Installed via: pip
What it does: Wraps the TCP connection as a standard Gymnasium environment so Stable-Baselines3 and other RL libraries can use it.

File structure

gym_donkeycar/
├── __init__.py              Registers all environments with gymnasium
├── core/
│   ├── sim_client.py        SDClient — raw TCP socket send/receive
│   ├── client.py            Low-level socket, threading, message queue
│   └── message.py           IMesgHandler interface
└── envs/
    ├── donkey_env.py        DonkeyEnv — THE gymnasium.Env subclass
    ├── donkey_sim.py        DonkeyUnitySimContoller — parses telemetry,
    │                         builds info dict, manages episode state
    └── donkey_proc.py       Optional: launches sim as subprocess

How they connect

DonkeyEnv (donkey_env.py)
    └── creates DonkeyUnitySimContoller (donkey_sim.py)
            └── creates SimClient (core/sim_client.py)
                    └── creates SDClient (core/client.py)
                            └── TCP socket → Unity sim

donkey_env.py — the Gymnasium interface

This is what your code calls with gym.make('donkey-generated-track-v0').

reset() → sends car_config, waits for sim started!, returns first obs
step(action) → sends control message (steering + throttle), waits for next telemetry frame, returns (obs, reward, terminated, truncated, info)
Observation = camera image (120×160×3 uint8)
Action space = Box([-1,0], [1,1]) — [steering, throttle]

donkey_sim.py — the telemetry parser

Receives JSON frames from the sim and maintains state:

Attribute	Source	Meaning
`self.image_array`	`image` field	Current camera frame
`self.cte`	`cte` field	Cross-track error (metres from centreline)
`self.speed`	`speed` field	Car speed (m/s)
`self.hit`	`hit` field	What was last hit (`"none"` or object name)
`self.x/y/z`	`pos_x/y/z`	World position
`self.lap_count`	crossing start line	Completed laps
`self.last_lap_time`	crossing start line	Most recent lap time (seconds)
`self.active_node`	`activeNode`	Current path node index ← newly added
`self.total_nodes`	`totalNodes`	Total path nodes ← newly added

The info dict returned from step() contains all of the above plus:

track_progress = active_node / total_nodes ← newly added, 0.0→1.0

Episode termination (done=True) fires when:

abs(cte) > max_cte (default 8m) — car too far off centreline
hit != "none" — car hit something (when detected by physics)

Registered environments

# All defined in gym_donkeycar/__init__.py
'donkey-generated-roads-v0'      → GeneratedRoadsEnv    (generated_road)
'donkey-generated-track-v0'      → GeneratedTrackEnv    (generated_track)
'donkey-mountain-track-v0'       → MountainTrackEnv     (mountain_track)
'donkey-minimonaco-track-v0'     → MiniMonacoEnv        (mini_monaco)
'donkey-warehouse-v0'            → WarehouseEnv
'donkey-roboracingleague-track-v0' → RoboRacingLeagueTrackEnv
# ... etc

Layer 4: Our Training Code

Location: agent/

reward_wrapper.py — SpeedRewardWrapper

Wraps a DonkeyEnv and completely replaces the sim's own reward signal.

v5 reward (current):

reward = (speed / 10.0) × (1 - |cte| / max_cte)

Fast + centred = high reward
Slow (e.g. on a hill) = low reward → gradient pushes toward more throttle
Off-track = near-zero reward
Crash (done=True) = -1.0
Short-lap exploit (<5s): large penalty

multitrack_runner.py — Training Loop

Manages round-robin training across multiple tracks:

Creates env on track A, trains for steps_per_switch steps
Calls close_and_switch() → sends exit_scene via existing viewer, closes env, waits, opens env on track B
Repeats until total_timesteps reached
Evaluates on test tracks (mini_monaco, etc.)

Wrapper stack applied to every env:

gym.make(track_id)                    ← raw DonkeyEnv
  → ThrottleClampWrapper              ← ensures minimum throttle (0.2 or 0.5)
    → StuckTerminationWrapper         ← ends episode if <0.5m in 80 steps
      → SpeedRewardWrapper            ← replaces reward with v5 formula
        → DummyVecEnv                 ← SB3 requires vectorised envs
          → VecTransposeImage         ← SB3 CNN needs (C,H,W) not (H,W,C)

Key design decisions

PPO with CnnPolicy — raw image input, SB3 handles CNN feature extraction
Continuous actions — steering [-1,1] and throttle [0,1]; no discretisation
No warm-start — each trial trains from random weights to avoid bias
Per-segment checkpointing — model saved after every training segment so timeouts don't lose all progress

Layer 5: Autoresearch (GP+UCB)

wave4_controller.py — outer loop:

Proposes hyperparameters (learning_rate, steps_per_switch, total_timesteps) using Gaussian Process + Upper Confidence Bound (GP+UCB)
Launches multitrack_runner.py as a subprocess
Parses test track scores from stdout
Updates GP with (hyperparams → score) to improve next proposal
Saves champion model when score improves

TinyGP — pure numpy Gaussian Process (no sklearn dependency):

Fits a smooth surface over (hyperparams → performance) space
UCB = mean + κ×std — balances exploiting known-good regions vs exploring uncertain ones

Data Flow: One Training Step

1. model.predict(obs) → action [steering, throttle]
2. ThrottleClampWrapper.step(action) → clamp throttle ≥ 0.2
3. StuckTerminationWrapper.step(action) → check if car moved <0.5m in 80 steps
4. SpeedRewardWrapper.step(action) → compute v5 reward, check short-lap exploit
5. DonkeyEnv.step(action) → send TCP "control" message to Unity sim
6. Unity sim → physics tick → send telemetry JSON back
7. donkey_sim.py → parse JSON → update cte, speed, active_node, track_progress
8. DonkeyEnv.step() returns (obs=camera_image, reward=sim_reward, done, info)
9. SpeedRewardWrapper replaces sim_reward with v5 reward
10. SB3 PPO stores (obs, action, v5_reward, done) in rollout buffer
11. After n_steps=2048: PPO gradient update → policy weights update

What track_progress Tells Us (New)

info['track_progress'] = activeNode / totalNodes

0.0 = car is at the start line
0.5 = car is halfway around the track
1.0 = car has completed the track

This is the first time we have forward progress information in the reward. Previously, CTE only told us "how far sideways from the centreline" — not "how far along the track." With track_progress we can reward the model for getting further around the track even if it's slow or slightly off-centre. This is especially important for mountain_track where the hill blocked learning.

File Quick Reference

File	What to edit when...
`agent/reward_wrapper.py`	Changing reward function
`agent/multitrack_runner.py`	Changing training loop, wrappers, track switching
`agent/wave4_controller.py`	Changing GP search, hyperparameter ranges
`gym_donkeycar/envs/donkey_sim.py`	Adding new fields from sim telemetry
`gym_donkeycar/envs/donkey_env.py`	Changing env reset/step behaviour
`sdsandbox/.../TcpCarHandler.cs`	Adding new telemetry fields from Unity
`sdsandbox/.../CarPath.cs`	Changing how CTE / track progress is computed

12 KiB Raw Permalink Blame History Unescape Escape