284 lines
12 KiB
Markdown
284 lines
12 KiB
Markdown
# System Architecture — DonkeyCar RL Autoresearch
|
||
|
||
## Overview
|
||
|
||
Five distinct layers talk to each other. From bottom to top:
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Layer 5: OUR CODE (autoresearch_controller, wave4_controller) │
|
||
│ GP+UCB proposes hyperparameters, launches training │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ Layer 4: OUR CODE (multitrack_runner, reward_wrapper) │
|
||
│ PPO training loop, reward shaping, track switching │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ Layer 3: gym_donkeycar (Python package, installed) │
|
||
│ Gymnasium environment wrapper around the sim │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ Layer 2: TCP socket (localhost:9091) │
|
||
│ JSON messages in both directions │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ Layer 1: sdsandbox (Unity app, running on Windows/WSL) │
|
||
│ 3D physics simulation, rendering, track logic │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Layer 1: sdsandbox (Unity Simulator)
|
||
|
||
**Location:** `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/`
|
||
**Language:** C# (Unity)
|
||
**What it does:** Runs the 3D physics simulation — car physics, track geometry,
|
||
collision detection, camera rendering, lap timing.
|
||
|
||
### Key C# scripts
|
||
|
||
| File | Role |
|
||
|------|------|
|
||
| `Scripts/tcp/TcpCarHandler.cs` | **Main bridge** — handles the TCP connection, reads steering/throttle commands, sends telemetry JSON every frame |
|
||
| `Scripts/CarPath.cs` | Defines the track centreline as a series of nodes; computes CTE via `GetCrossTrackErr()` |
|
||
| `Scripts/PathManager.cs` | Manages the active path, knows which node the car is near (`iActiveSpan`) |
|
||
| `Scripts/startingLine.cs` | Detects lap completions, measures lap times |
|
||
| `Scripts/Car.cs` | Car physics — applies steering/throttle, tracks velocity, collision |
|
||
| `Scripts/SceneLoader.cs` | Loads/unloads track scenes in response to `load_scene` / `exit_scene` messages |
|
||
| `Scripts/GlobalState.cs` | Flags like `extendedTelemetry` that gate which fields are sent |
|
||
|
||
### What the sim sends every frame (telemetry JSON)
|
||
|
||
```json
|
||
{
|
||
"msg_type": "telemetry",
|
||
"steering_angle": 0.0,
|
||
"throttle": 0.4,
|
||
"image": "<base64 camera image>",
|
||
"hit": "none",
|
||
"time": 12.34,
|
||
"speed": 2.5,
|
||
"accel_x/y/z": ...,
|
||
"gyro_x/y/z": ...,
|
||
"pitch/yaw/roll": ...,
|
||
"activeNode": 42, ← current path node index (ALWAYS sent)
|
||
"totalNodes": 186, ← total path nodes (ALWAYS sent)
|
||
"cte": 0.3, ← cross-track error (extendedTelemetry=true)
|
||
"pos_x/y/z": ..., ← world position (extendedTelemetry=true)
|
||
"vel_x/y/z": ... ← world velocity (extendedTelemetry=true)
|
||
}
|
||
```
|
||
|
||
### What the sim receives (commands)
|
||
|
||
```json
|
||
{ "msg_type": "control", "steering": 0.2, "throttle": 0.5, "brake": 0.0 }
|
||
{ "msg_type": "load_scene", "scene_name": "generated_track" }
|
||
{ "msg_type": "exit_scene" }
|
||
{ "msg_type": "car_config", ... }
|
||
```
|
||
|
||
---
|
||
|
||
## Layer 2: TCP Socket (localhost:9091)
|
||
|
||
A plain TCP connection carrying newline-delimited JSON messages.
|
||
The sim is the **server** (listens on 9091).
|
||
Python is the **client** (connects to 9091).
|
||
|
||
**Critical rule:** Each `gym.make()` call opens ONE TCP connection, which
|
||
spawns ONE car in the sim. Opening a second connection spawns a phantom
|
||
second car. Always `env.close()` before opening a new connection.
|
||
Track switching must go through the EXISTING connection via `exit_scene`,
|
||
not by opening a new connection.
|
||
|
||
---
|
||
|
||
## Layer 3: gym_donkeycar (Python Package)
|
||
|
||
**Location:** `/home/paulh/.local/lib/python3.10/site-packages/gym_donkeycar/`
|
||
**Installed via:** pip
|
||
**What it does:** Wraps the TCP connection as a standard Gymnasium environment
|
||
so Stable-Baselines3 and other RL libraries can use it.
|
||
|
||
### File structure
|
||
|
||
```
|
||
gym_donkeycar/
|
||
├── __init__.py Registers all environments with gymnasium
|
||
├── core/
|
||
│ ├── sim_client.py SDClient — raw TCP socket send/receive
|
||
│ ├── client.py Low-level socket, threading, message queue
|
||
│ └── message.py IMesgHandler interface
|
||
└── envs/
|
||
├── donkey_env.py DonkeyEnv — THE gymnasium.Env subclass
|
||
├── donkey_sim.py DonkeyUnitySimContoller — parses telemetry,
|
||
│ builds info dict, manages episode state
|
||
└── donkey_proc.py Optional: launches sim as subprocess
|
||
```
|
||
|
||
### How they connect
|
||
|
||
```
|
||
DonkeyEnv (donkey_env.py)
|
||
└── creates DonkeyUnitySimContoller (donkey_sim.py)
|
||
└── creates SimClient (core/sim_client.py)
|
||
└── creates SDClient (core/client.py)
|
||
└── TCP socket → Unity sim
|
||
```
|
||
|
||
### donkey_env.py — the Gymnasium interface
|
||
|
||
This is what your code calls with `gym.make('donkey-generated-track-v0')`.
|
||
|
||
- `reset()` → sends `car_config`, waits for `sim started!`, returns first obs
|
||
- `step(action)` → sends `control` message (steering + throttle), waits for
|
||
next telemetry frame, returns `(obs, reward, terminated, truncated, info)`
|
||
- Observation = camera image (120×160×3 uint8)
|
||
- Action space = Box([-1,0], [1,1]) — [steering, throttle]
|
||
|
||
### donkey_sim.py — the telemetry parser
|
||
|
||
Receives JSON frames from the sim and maintains state:
|
||
|
||
| Attribute | Source | Meaning |
|
||
|-----------|--------|---------|
|
||
| `self.image_array` | `image` field | Current camera frame |
|
||
| `self.cte` | `cte` field | Cross-track error (metres from centreline) |
|
||
| `self.speed` | `speed` field | Car speed (m/s) |
|
||
| `self.hit` | `hit` field | What was last hit (`"none"` or object name) |
|
||
| `self.x/y/z` | `pos_x/y/z` | World position |
|
||
| `self.lap_count` | crossing start line | Completed laps |
|
||
| `self.last_lap_time` | crossing start line | Most recent lap time (seconds) |
|
||
| `self.active_node` | `activeNode` | Current path node index ← **newly added** |
|
||
| `self.total_nodes` | `totalNodes` | Total path nodes ← **newly added** |
|
||
|
||
The info dict returned from `step()` contains all of the above plus:
|
||
- `track_progress = active_node / total_nodes` ← **newly added, 0.0→1.0**
|
||
|
||
Episode termination (`done=True`) fires when:
|
||
- `abs(cte) > max_cte` (default 8m) — car too far off centreline
|
||
- `hit != "none"` — car hit something (when detected by physics)
|
||
|
||
### Registered environments
|
||
|
||
```python
|
||
# All defined in gym_donkeycar/__init__.py
|
||
'donkey-generated-roads-v0' → GeneratedRoadsEnv (generated_road)
|
||
'donkey-generated-track-v0' → GeneratedTrackEnv (generated_track)
|
||
'donkey-mountain-track-v0' → MountainTrackEnv (mountain_track)
|
||
'donkey-minimonaco-track-v0' → MiniMonacoEnv (mini_monaco)
|
||
'donkey-warehouse-v0' → WarehouseEnv
|
||
'donkey-roboracingleague-track-v0' → RoboRacingLeagueTrackEnv
|
||
# ... etc
|
||
```
|
||
|
||
---
|
||
|
||
## Layer 4: Our Training Code
|
||
|
||
**Location:** `agent/`
|
||
|
||
### reward_wrapper.py — SpeedRewardWrapper
|
||
|
||
Wraps a DonkeyEnv and **completely replaces** the sim's own reward signal.
|
||
|
||
**v5 reward (current):**
|
||
```python
|
||
reward = (speed / 10.0) × (1 - |cte| / max_cte)
|
||
```
|
||
- Fast + centred = high reward
|
||
- Slow (e.g. on a hill) = low reward → gradient pushes toward more throttle
|
||
- Off-track = near-zero reward
|
||
- Crash (done=True) = -1.0
|
||
- Short-lap exploit (<5s): large penalty
|
||
|
||
### multitrack_runner.py — Training Loop
|
||
|
||
Manages round-robin training across multiple tracks:
|
||
1. Creates env on track A, trains for `steps_per_switch` steps
|
||
2. Calls `close_and_switch()` → sends `exit_scene` via existing viewer,
|
||
closes env, waits, opens env on track B
|
||
3. Repeats until `total_timesteps` reached
|
||
4. Evaluates on test tracks (mini_monaco, etc.)
|
||
|
||
**Wrapper stack applied to every env:**
|
||
```
|
||
gym.make(track_id) ← raw DonkeyEnv
|
||
→ ThrottleClampWrapper ← ensures minimum throttle (0.2 or 0.5)
|
||
→ StuckTerminationWrapper ← ends episode if <0.5m in 80 steps
|
||
→ SpeedRewardWrapper ← replaces reward with v5 formula
|
||
→ DummyVecEnv ← SB3 requires vectorised envs
|
||
→ VecTransposeImage ← SB3 CNN needs (C,H,W) not (H,W,C)
|
||
```
|
||
|
||
### Key design decisions
|
||
|
||
- **PPO with CnnPolicy** — raw image input, SB3 handles CNN feature extraction
|
||
- **Continuous actions** — steering [-1,1] and throttle [0,1]; no discretisation
|
||
- **No warm-start** — each trial trains from random weights to avoid bias
|
||
- **Per-segment checkpointing** — model saved after every training segment
|
||
so timeouts don't lose all progress
|
||
|
||
---
|
||
|
||
## Layer 5: Autoresearch (GP+UCB)
|
||
|
||
**wave4_controller.py** — outer loop:
|
||
1. Proposes hyperparameters (learning_rate, steps_per_switch, total_timesteps)
|
||
using Gaussian Process + Upper Confidence Bound (GP+UCB)
|
||
2. Launches `multitrack_runner.py` as a subprocess
|
||
3. Parses test track scores from stdout
|
||
4. Updates GP with (hyperparams → score) to improve next proposal
|
||
5. Saves champion model when score improves
|
||
|
||
**TinyGP** — pure numpy Gaussian Process (no sklearn dependency):
|
||
- Fits a smooth surface over (hyperparams → performance) space
|
||
- UCB = mean + κ×std — balances exploiting known-good regions vs exploring uncertain ones
|
||
|
||
---
|
||
|
||
## Data Flow: One Training Step
|
||
|
||
```
|
||
1. model.predict(obs) → action [steering, throttle]
|
||
2. ThrottleClampWrapper.step(action) → clamp throttle ≥ 0.2
|
||
3. StuckTerminationWrapper.step(action) → check if car moved <0.5m in 80 steps
|
||
4. SpeedRewardWrapper.step(action) → compute v5 reward, check short-lap exploit
|
||
5. DonkeyEnv.step(action) → send TCP "control" message to Unity sim
|
||
6. Unity sim → physics tick → send telemetry JSON back
|
||
7. donkey_sim.py → parse JSON → update cte, speed, active_node, track_progress
|
||
8. DonkeyEnv.step() returns (obs=camera_image, reward=sim_reward, done, info)
|
||
9. SpeedRewardWrapper replaces sim_reward with v5 reward
|
||
10. SB3 PPO stores (obs, action, v5_reward, done) in rollout buffer
|
||
11. After n_steps=2048: PPO gradient update → policy weights update
|
||
```
|
||
|
||
---
|
||
|
||
## What track_progress Tells Us (New)
|
||
|
||
`info['track_progress']` = `activeNode / totalNodes`
|
||
|
||
- **0.0** = car is at the start line
|
||
- **0.5** = car is halfway around the track
|
||
- **1.0** = car has completed the track
|
||
|
||
This is the **first time we have forward progress information** in the reward.
|
||
Previously, CTE only told us "how far sideways from the centreline" — not
|
||
"how far along the track." With track_progress we can reward the model for
|
||
getting further around the track even if it's slow or slightly off-centre.
|
||
This is especially important for mountain_track where the hill blocked learning.
|
||
|
||
---
|
||
|
||
## File Quick Reference
|
||
|
||
| File | What to edit when... |
|
||
|------|---------------------|
|
||
| `agent/reward_wrapper.py` | Changing reward function |
|
||
| `agent/multitrack_runner.py` | Changing training loop, wrappers, track switching |
|
||
| `agent/wave4_controller.py` | Changing GP search, hyperparameter ranges |
|
||
| `gym_donkeycar/envs/donkey_sim.py` | Adding new fields from sim telemetry |
|
||
| `gym_donkeycar/envs/donkey_env.py` | Changing env reset/step behaviour |
|
||
| `sdsandbox/.../TcpCarHandler.cs` | Adding new telemetry fields from Unity |
|
||
| `sdsandbox/.../CarPath.cs` | Changing how CTE / track progress is computed |
|
||
|