From 0b5ce6ab7e9409790beefeed899b3330e5f74eaf Mon Sep 17 00:00:00 2001 From: Paul Huliganga Date: Fri, 17 Apr 2026 14:06:38 -0400 Subject: [PATCH] =?UTF-8?q?docs:=20ARCHITECTURE.md=20=E2=80=94=20complete?= =?UTF-8?q?=20system=20architecture=20guide?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Explains all 5 layers: 1. sdsandbox (Unity C# simulator) 2. TCP socket (JSON protocol) 3. gym_donkeycar (Python gymnasium wrapper) 4. Our training code (reward_wrapper, multitrack_runner) 5. Autoresearch (GP+UCB controller) Includes data flow, file quick reference, key design decisions, and explanation of the new track_progress field. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A --- docs/ARCHITECTURE.md | 283 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 283 insertions(+) create mode 100644 docs/ARCHITECTURE.md diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..e34ad73 --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,283 @@ +# System Architecture — DonkeyCar RL Autoresearch + +## Overview + +Five distinct layers talk to each other. From bottom to top: + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ Layer 5: OUR CODE (autoresearch_controller, wave4_controller) │ +│ GP+UCB proposes hyperparameters, launches training │ +├─────────────────────────────────────────────────────────────────┤ +│ Layer 4: OUR CODE (multitrack_runner, reward_wrapper) │ +│ PPO training loop, reward shaping, track switching │ +├─────────────────────────────────────────────────────────────────┤ +│ Layer 3: gym_donkeycar (Python package, installed) │ +│ Gymnasium environment wrapper around the sim │ +├─────────────────────────────────────────────────────────────────┤ +│ Layer 2: TCP socket (localhost:9091) │ +│ JSON messages in both directions │ +├─────────────────────────────────────────────────────────────────┤ +│ Layer 1: sdsandbox (Unity app, running on Windows/WSL) │ +│ 3D physics simulation, rendering, track logic │ +└─────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Layer 1: sdsandbox (Unity Simulator) + +**Location:** `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/` +**Language:** C# (Unity) +**What it does:** Runs the 3D physics simulation — car physics, track geometry, +collision detection, camera rendering, lap timing. + +### Key C# scripts + +| File | Role | +|------|------| +| `Scripts/tcp/TcpCarHandler.cs` | **Main bridge** — handles the TCP connection, reads steering/throttle commands, sends telemetry JSON every frame | +| `Scripts/CarPath.cs` | Defines the track centreline as a series of nodes; computes CTE via `GetCrossTrackErr()` | +| `Scripts/PathManager.cs` | Manages the active path, knows which node the car is near (`iActiveSpan`) | +| `Scripts/startingLine.cs` | Detects lap completions, measures lap times | +| `Scripts/Car.cs` | Car physics — applies steering/throttle, tracks velocity, collision | +| `Scripts/SceneLoader.cs` | Loads/unloads track scenes in response to `load_scene` / `exit_scene` messages | +| `Scripts/GlobalState.cs` | Flags like `extendedTelemetry` that gate which fields are sent | + +### What the sim sends every frame (telemetry JSON) + +```json +{ + "msg_type": "telemetry", + "steering_angle": 0.0, + "throttle": 0.4, + "image": "", + "hit": "none", + "time": 12.34, + "speed": 2.5, + "accel_x/y/z": ..., + "gyro_x/y/z": ..., + "pitch/yaw/roll": ..., + "activeNode": 42, ← current path node index (ALWAYS sent) + "totalNodes": 186, ← total path nodes (ALWAYS sent) + "cte": 0.3, ← cross-track error (extendedTelemetry=true) + "pos_x/y/z": ..., ← world position (extendedTelemetry=true) + "vel_x/y/z": ... ← world velocity (extendedTelemetry=true) +} +``` + +### What the sim receives (commands) + +```json +{ "msg_type": "control", "steering": 0.2, "throttle": 0.5, "brake": 0.0 } +{ "msg_type": "load_scene", "scene_name": "generated_track" } +{ "msg_type": "exit_scene" } +{ "msg_type": "car_config", ... } +``` + +--- + +## Layer 2: TCP Socket (localhost:9091) + +A plain TCP connection carrying newline-delimited JSON messages. +The sim is the **server** (listens on 9091). +Python is the **client** (connects to 9091). + +**Critical rule:** Each `gym.make()` call opens ONE TCP connection, which +spawns ONE car in the sim. Opening a second connection spawns a phantom +second car. Always `env.close()` before opening a new connection. +Track switching must go through the EXISTING connection via `exit_scene`, +not by opening a new connection. + +--- + +## Layer 3: gym_donkeycar (Python Package) + +**Location:** `/home/paulh/.local/lib/python3.10/site-packages/gym_donkeycar/` +**Installed via:** pip +**What it does:** Wraps the TCP connection as a standard Gymnasium environment +so Stable-Baselines3 and other RL libraries can use it. + +### File structure + +``` +gym_donkeycar/ +├── __init__.py Registers all environments with gymnasium +├── core/ +│ ├── sim_client.py SDClient — raw TCP socket send/receive +│ ├── client.py Low-level socket, threading, message queue +│ └── message.py IMesgHandler interface +└── envs/ + ├── donkey_env.py DonkeyEnv — THE gymnasium.Env subclass + ├── donkey_sim.py DonkeyUnitySimContoller — parses telemetry, + │ builds info dict, manages episode state + └── donkey_proc.py Optional: launches sim as subprocess +``` + +### How they connect + +``` +DonkeyEnv (donkey_env.py) + └── creates DonkeyUnitySimContoller (donkey_sim.py) + └── creates SimClient (core/sim_client.py) + └── creates SDClient (core/client.py) + └── TCP socket → Unity sim +``` + +### donkey_env.py — the Gymnasium interface + +This is what your code calls with `gym.make('donkey-generated-track-v0')`. + +- `reset()` → sends `car_config`, waits for `sim started!`, returns first obs +- `step(action)` → sends `control` message (steering + throttle), waits for + next telemetry frame, returns `(obs, reward, terminated, truncated, info)` +- Observation = camera image (120×160×3 uint8) +- Action space = Box([-1,0], [1,1]) — [steering, throttle] + +### donkey_sim.py — the telemetry parser + +Receives JSON frames from the sim and maintains state: + +| Attribute | Source | Meaning | +|-----------|--------|---------| +| `self.image_array` | `image` field | Current camera frame | +| `self.cte` | `cte` field | Cross-track error (metres from centreline) | +| `self.speed` | `speed` field | Car speed (m/s) | +| `self.hit` | `hit` field | What was last hit (`"none"` or object name) | +| `self.x/y/z` | `pos_x/y/z` | World position | +| `self.lap_count` | crossing start line | Completed laps | +| `self.last_lap_time` | crossing start line | Most recent lap time (seconds) | +| `self.active_node` | `activeNode` | Current path node index ← **newly added** | +| `self.total_nodes` | `totalNodes` | Total path nodes ← **newly added** | + +The info dict returned from `step()` contains all of the above plus: +- `track_progress = active_node / total_nodes` ← **newly added, 0.0→1.0** + +Episode termination (`done=True`) fires when: +- `abs(cte) > max_cte` (default 8m) — car too far off centreline +- `hit != "none"` — car hit something (when detected by physics) + +### Registered environments + +```python +# All defined in gym_donkeycar/__init__.py +'donkey-generated-roads-v0' → GeneratedRoadsEnv (generated_road) +'donkey-generated-track-v0' → GeneratedTrackEnv (generated_track) +'donkey-mountain-track-v0' → MountainTrackEnv (mountain_track) +'donkey-minimonaco-track-v0' → MiniMonacoEnv (mini_monaco) +'donkey-warehouse-v0' → WarehouseEnv +'donkey-roboracingleague-track-v0' → RoboRacingLeagueTrackEnv +# ... etc +``` + +--- + +## Layer 4: Our Training Code + +**Location:** `agent/` + +### reward_wrapper.py — SpeedRewardWrapper + +Wraps a DonkeyEnv and **completely replaces** the sim's own reward signal. + +**v5 reward (current):** +```python +reward = (speed / 10.0) × (1 - |cte| / max_cte) +``` +- Fast + centred = high reward +- Slow (e.g. on a hill) = low reward → gradient pushes toward more throttle +- Off-track = near-zero reward +- Crash (done=True) = -1.0 +- Short-lap exploit (<5s): large penalty + +### multitrack_runner.py — Training Loop + +Manages round-robin training across multiple tracks: +1. Creates env on track A, trains for `steps_per_switch` steps +2. Calls `close_and_switch()` → sends `exit_scene` via existing viewer, + closes env, waits, opens env on track B +3. Repeats until `total_timesteps` reached +4. Evaluates on test tracks (mini_monaco, etc.) + +**Wrapper stack applied to every env:** +``` +gym.make(track_id) ← raw DonkeyEnv + → ThrottleClampWrapper ← ensures minimum throttle (0.2 or 0.5) + → StuckTerminationWrapper ← ends episode if <0.5m in 80 steps + → SpeedRewardWrapper ← replaces reward with v5 formula + → DummyVecEnv ← SB3 requires vectorised envs + → VecTransposeImage ← SB3 CNN needs (C,H,W) not (H,W,C) +``` + +### Key design decisions + +- **PPO with CnnPolicy** — raw image input, SB3 handles CNN feature extraction +- **Continuous actions** — steering [-1,1] and throttle [0,1]; no discretisation +- **No warm-start** — each trial trains from random weights to avoid bias +- **Per-segment checkpointing** — model saved after every training segment + so timeouts don't lose all progress + +--- + +## Layer 5: Autoresearch (GP+UCB) + +**wave4_controller.py** — outer loop: +1. Proposes hyperparameters (learning_rate, steps_per_switch, total_timesteps) + using Gaussian Process + Upper Confidence Bound (GP+UCB) +2. Launches `multitrack_runner.py` as a subprocess +3. Parses test track scores from stdout +4. Updates GP with (hyperparams → score) to improve next proposal +5. Saves champion model when score improves + +**TinyGP** — pure numpy Gaussian Process (no sklearn dependency): +- Fits a smooth surface over (hyperparams → performance) space +- UCB = mean + κ×std — balances exploiting known-good regions vs exploring uncertain ones + +--- + +## Data Flow: One Training Step + +``` +1. model.predict(obs) → action [steering, throttle] +2. ThrottleClampWrapper.step(action) → clamp throttle ≥ 0.2 +3. StuckTerminationWrapper.step(action) → check if car moved <0.5m in 80 steps +4. SpeedRewardWrapper.step(action) → compute v5 reward, check short-lap exploit +5. DonkeyEnv.step(action) → send TCP "control" message to Unity sim +6. Unity sim → physics tick → send telemetry JSON back +7. donkey_sim.py → parse JSON → update cte, speed, active_node, track_progress +8. DonkeyEnv.step() returns (obs=camera_image, reward=sim_reward, done, info) +9. SpeedRewardWrapper replaces sim_reward with v5 reward +10. SB3 PPO stores (obs, action, v5_reward, done) in rollout buffer +11. After n_steps=2048: PPO gradient update → policy weights update +``` + +--- + +## What track_progress Tells Us (New) + +`info['track_progress']` = `activeNode / totalNodes` + +- **0.0** = car is at the start line +- **0.5** = car is halfway around the track +- **1.0** = car has completed the track + +This is the **first time we have forward progress information** in the reward. +Previously, CTE only told us "how far sideways from the centreline" — not +"how far along the track." With track_progress we can reward the model for +getting further around the track even if it's slow or slightly off-centre. +This is especially important for mountain_track where the hill blocked learning. + +--- + +## File Quick Reference + +| File | What to edit when... | +|------|---------------------| +| `agent/reward_wrapper.py` | Changing reward function | +| `agent/multitrack_runner.py` | Changing training loop, wrappers, track switching | +| `agent/wave4_controller.py` | Changing GP search, hyperparameter ranges | +| `gym_donkeycar/envs/donkey_sim.py` | Adding new fields from sim telemetry | +| `gym_donkeycar/envs/donkey_env.py` | Changing env reset/step behaviour | +| `sdsandbox/.../TcpCarHandler.cs` | Adding new telemetry fields from Unity | +| `sdsandbox/.../CarPath.cs` | Changing how CTE / track progress is computed | +