donkeycar-rl-autoresearch/docs/ARCHITECTURE.md

284 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# System Architecture — DonkeyCar RL Autoresearch
## Overview
Five distinct layers talk to each other. From bottom to top:
```
┌─────────────────────────────────────────────────────────────────┐
│ Layer 5: OUR CODE (autoresearch_controller, wave4_controller) │
│ GP+UCB proposes hyperparameters, launches training │
├─────────────────────────────────────────────────────────────────┤
│ Layer 4: OUR CODE (multitrack_runner, reward_wrapper) │
│ PPO training loop, reward shaping, track switching │
├─────────────────────────────────────────────────────────────────┤
│ Layer 3: gym_donkeycar (Python package, installed) │
│ Gymnasium environment wrapper around the sim │
├─────────────────────────────────────────────────────────────────┤
│ Layer 2: TCP socket (localhost:9091) │
│ JSON messages in both directions │
├─────────────────────────────────────────────────────────────────┤
│ Layer 1: sdsandbox (Unity app, running on Windows/WSL) │
│ 3D physics simulation, rendering, track logic │
└─────────────────────────────────────────────────────────────────┘
```
---
## Layer 1: sdsandbox (Unity Simulator)
**Location:** `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/`
**Language:** C# (Unity)
**What it does:** Runs the 3D physics simulation — car physics, track geometry,
collision detection, camera rendering, lap timing.
### Key C# scripts
| File | Role |
|------|------|
| `Scripts/tcp/TcpCarHandler.cs` | **Main bridge** — handles the TCP connection, reads steering/throttle commands, sends telemetry JSON every frame |
| `Scripts/CarPath.cs` | Defines the track centreline as a series of nodes; computes CTE via `GetCrossTrackErr()` |
| `Scripts/PathManager.cs` | Manages the active path, knows which node the car is near (`iActiveSpan`) |
| `Scripts/startingLine.cs` | Detects lap completions, measures lap times |
| `Scripts/Car.cs` | Car physics — applies steering/throttle, tracks velocity, collision |
| `Scripts/SceneLoader.cs` | Loads/unloads track scenes in response to `load_scene` / `exit_scene` messages |
| `Scripts/GlobalState.cs` | Flags like `extendedTelemetry` that gate which fields are sent |
### What the sim sends every frame (telemetry JSON)
```json
{
"msg_type": "telemetry",
"steering_angle": 0.0,
"throttle": 0.4,
"image": "<base64 camera image>",
"hit": "none",
"time": 12.34,
"speed": 2.5,
"accel_x/y/z": ...,
"gyro_x/y/z": ...,
"pitch/yaw/roll": ...,
"activeNode": 42, current path node index (ALWAYS sent)
"totalNodes": 186, total path nodes (ALWAYS sent)
"cte": 0.3, cross-track error (extendedTelemetry=true)
"pos_x/y/z": ..., world position (extendedTelemetry=true)
"vel_x/y/z": ... world velocity (extendedTelemetry=true)
}
```
### What the sim receives (commands)
```json
{ "msg_type": "control", "steering": 0.2, "throttle": 0.5, "brake": 0.0 }
{ "msg_type": "load_scene", "scene_name": "generated_track" }
{ "msg_type": "exit_scene" }
{ "msg_type": "car_config", ... }
```
---
## Layer 2: TCP Socket (localhost:9091)
A plain TCP connection carrying newline-delimited JSON messages.
The sim is the **server** (listens on 9091).
Python is the **client** (connects to 9091).
**Critical rule:** Each `gym.make()` call opens ONE TCP connection, which
spawns ONE car in the sim. Opening a second connection spawns a phantom
second car. Always `env.close()` before opening a new connection.
Track switching must go through the EXISTING connection via `exit_scene`,
not by opening a new connection.
---
## Layer 3: gym_donkeycar (Python Package)
**Location:** `/home/paulh/.local/lib/python3.10/site-packages/gym_donkeycar/`
**Installed via:** pip
**What it does:** Wraps the TCP connection as a standard Gymnasium environment
so Stable-Baselines3 and other RL libraries can use it.
### File structure
```
gym_donkeycar/
├── __init__.py Registers all environments with gymnasium
├── core/
│ ├── sim_client.py SDClient — raw TCP socket send/receive
│ ├── client.py Low-level socket, threading, message queue
│ └── message.py IMesgHandler interface
└── envs/
├── donkey_env.py DonkeyEnv — THE gymnasium.Env subclass
├── donkey_sim.py DonkeyUnitySimContoller — parses telemetry,
│ builds info dict, manages episode state
└── donkey_proc.py Optional: launches sim as subprocess
```
### How they connect
```
DonkeyEnv (donkey_env.py)
└── creates DonkeyUnitySimContoller (donkey_sim.py)
└── creates SimClient (core/sim_client.py)
└── creates SDClient (core/client.py)
└── TCP socket → Unity sim
```
### donkey_env.py — the Gymnasium interface
This is what your code calls with `gym.make('donkey-generated-track-v0')`.
- `reset()` → sends `car_config`, waits for `sim started!`, returns first obs
- `step(action)` → sends `control` message (steering + throttle), waits for
next telemetry frame, returns `(obs, reward, terminated, truncated, info)`
- Observation = camera image (120×160×3 uint8)
- Action space = Box([-1,0], [1,1]) — [steering, throttle]
### donkey_sim.py — the telemetry parser
Receives JSON frames from the sim and maintains state:
| Attribute | Source | Meaning |
|-----------|--------|---------|
| `self.image_array` | `image` field | Current camera frame |
| `self.cte` | `cte` field | Cross-track error (metres from centreline) |
| `self.speed` | `speed` field | Car speed (m/s) |
| `self.hit` | `hit` field | What was last hit (`"none"` or object name) |
| `self.x/y/z` | `pos_x/y/z` | World position |
| `self.lap_count` | crossing start line | Completed laps |
| `self.last_lap_time` | crossing start line | Most recent lap time (seconds) |
| `self.active_node` | `activeNode` | Current path node index ← **newly added** |
| `self.total_nodes` | `totalNodes` | Total path nodes ← **newly added** |
The info dict returned from `step()` contains all of the above plus:
- `track_progress = active_node / total_nodes`**newly added, 0.0→1.0**
Episode termination (`done=True`) fires when:
- `abs(cte) > max_cte` (default 8m) — car too far off centreline
- `hit != "none"` — car hit something (when detected by physics)
### Registered environments
```python
# All defined in gym_donkeycar/__init__.py
'donkey-generated-roads-v0' GeneratedRoadsEnv (generated_road)
'donkey-generated-track-v0' GeneratedTrackEnv (generated_track)
'donkey-mountain-track-v0' MountainTrackEnv (mountain_track)
'donkey-minimonaco-track-v0' MiniMonacoEnv (mini_monaco)
'donkey-warehouse-v0' WarehouseEnv
'donkey-roboracingleague-track-v0' RoboRacingLeagueTrackEnv
# ... etc
```
---
## Layer 4: Our Training Code
**Location:** `agent/`
### reward_wrapper.py — SpeedRewardWrapper
Wraps a DonkeyEnv and **completely replaces** the sim's own reward signal.
**v5 reward (current):**
```python
reward = (speed / 10.0) × (1 - |cte| / max_cte)
```
- Fast + centred = high reward
- Slow (e.g. on a hill) = low reward → gradient pushes toward more throttle
- Off-track = near-zero reward
- Crash (done=True) = -1.0
- Short-lap exploit (<5s): large penalty
### multitrack_runner.py — Training Loop
Manages round-robin training across multiple tracks:
1. Creates env on track A, trains for `steps_per_switch` steps
2. Calls `close_and_switch()` sends `exit_scene` via existing viewer,
closes env, waits, opens env on track B
3. Repeats until `total_timesteps` reached
4. Evaluates on test tracks (mini_monaco, etc.)
**Wrapper stack applied to every env:**
```
gym.make(track_id) ← raw DonkeyEnv
→ ThrottleClampWrapper ← ensures minimum throttle (0.2 or 0.5)
→ StuckTerminationWrapper ← ends episode if <0.5m in 80 steps
→ SpeedRewardWrapper ← replaces reward with v5 formula
→ DummyVecEnv ← SB3 requires vectorised envs
→ VecTransposeImage ← SB3 CNN needs (C,H,W) not (H,W,C)
```
### Key design decisions
- **PPO with CnnPolicy** raw image input, SB3 handles CNN feature extraction
- **Continuous actions** steering [-1,1] and throttle [0,1]; no discretisation
- **No warm-start** each trial trains from random weights to avoid bias
- **Per-segment checkpointing** model saved after every training segment
so timeouts don't lose all progress
---
## Layer 5: Autoresearch (GP+UCB)
**wave4_controller.py** outer loop:
1. Proposes hyperparameters (learning_rate, steps_per_switch, total_timesteps)
using Gaussian Process + Upper Confidence Bound (GP+UCB)
2. Launches `multitrack_runner.py` as a subprocess
3. Parses test track scores from stdout
4. Updates GP with (hyperparams score) to improve next proposal
5. Saves champion model when score improves
**TinyGP** pure numpy Gaussian Process (no sklearn dependency):
- Fits a smooth surface over (hyperparams performance) space
- UCB = mean + κ×std balances exploiting known-good regions vs exploring uncertain ones
---
## Data Flow: One Training Step
```
1. model.predict(obs) → action [steering, throttle]
2. ThrottleClampWrapper.step(action) → clamp throttle ≥ 0.2
3. StuckTerminationWrapper.step(action) → check if car moved <0.5m in 80 steps
4. SpeedRewardWrapper.step(action) → compute v5 reward, check short-lap exploit
5. DonkeyEnv.step(action) → send TCP "control" message to Unity sim
6. Unity sim → physics tick → send telemetry JSON back
7. donkey_sim.py → parse JSON → update cte, speed, active_node, track_progress
8. DonkeyEnv.step() returns (obs=camera_image, reward=sim_reward, done, info)
9. SpeedRewardWrapper replaces sim_reward with v5 reward
10. SB3 PPO stores (obs, action, v5_reward, done) in rollout buffer
11. After n_steps=2048: PPO gradient update → policy weights update
```
---
## What track_progress Tells Us (New)
`info['track_progress']` = `activeNode / totalNodes`
- **0.0** = car is at the start line
- **0.5** = car is halfway around the track
- **1.0** = car has completed the track
This is the **first time we have forward progress information** in the reward.
Previously, CTE only told us "how far sideways from the centreline" not
"how far along the track." With track_progress we can reward the model for
getting further around the track even if it's slow or slightly off-centre.
This is especially important for mountain_track where the hill blocked learning.
---
## File Quick Reference
| File | What to edit when... |
|------|---------------------|
| `agent/reward_wrapper.py` | Changing reward function |
| `agent/multitrack_runner.py` | Changing training loop, wrappers, track switching |
| `agent/wave4_controller.py` | Changing GP search, hyperparameter ranges |
| `gym_donkeycar/envs/donkey_sim.py` | Adding new fields from sim telemetry |
| `gym_donkeycar/envs/donkey_env.py` | Changing env reset/step behaviour |
| `sdsandbox/.../TcpCarHandler.cs` | Adding new telemetry fields from Unity |
| `sdsandbox/.../CarPath.cs` | Changing how CTE / track progress is computed |