From 0b5ce6ab7e9409790beefeed899b3330e5f74eaf Mon Sep 17 00:00:00 2001
From: Paul Huliganga <paje0101@gmail.com>
Date: Fri, 17 Apr 2026 14:06:38 -0400
Subject: [PATCH] =?UTF-8?q?docs:=20ARCHITECTURE.md=20=E2=80=94=20complete?=
 =?UTF-8?q?=20system=20architecture=20guide?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Explains all 5 layers:
1. sdsandbox (Unity C# simulator)
2. TCP socket (JSON protocol)
3. gym_donkeycar (Python gymnasium wrapper)
4. Our training code (reward_wrapper, multitrack_runner)
5. Autoresearch (GP+UCB controller)

Includes data flow, file quick reference, key design decisions,
and explanation of the new track_progress field.

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
---
 docs/ARCHITECTURE.md | 283 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 283 insertions(+)
 create mode 100644 docs/ARCHITECTURE.md

diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
new file mode 100644
index 0000000..e34ad73
--- /dev/null
+++ b/docs/ARCHITECTURE.md
@@ -0,0 +1,283 @@
+# System Architecture — DonkeyCar RL Autoresearch
+
+## Overview
+
+Five distinct layers talk to each other. From bottom to top:
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│  Layer 5: OUR CODE  (autoresearch_controller, wave4_controller) │
+│           GP+UCB proposes hyperparameters, launches training     │
+├─────────────────────────────────────────────────────────────────┤
+│  Layer 4: OUR CODE  (multitrack_runner, reward_wrapper)         │
+│           PPO training loop, reward shaping, track switching    │
+├─────────────────────────────────────────────────────────────────┤
+│  Layer 3: gym_donkeycar  (Python package, installed)            │
+│           Gymnasium environment wrapper around the sim           │
+├─────────────────────────────────────────────────────────────────┤
+│  Layer 2: TCP socket  (localhost:9091)                          │
+│           JSON messages in both directions                       │
+├─────────────────────────────────────────────────────────────────┤
+│  Layer 1: sdsandbox  (Unity app, running on Windows/WSL)        │
+│           3D physics simulation, rendering, track logic          │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Layer 1: sdsandbox (Unity Simulator)
+
+**Location:** `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/`  
+**Language:** C# (Unity)  
+**What it does:** Runs the 3D physics simulation — car physics, track geometry,
+collision detection, camera rendering, lap timing.
+
+### Key C# scripts
+
+| File | Role |
+|------|------|
+| `Scripts/tcp/TcpCarHandler.cs` | **Main bridge** — handles the TCP connection, reads steering/throttle commands, sends telemetry JSON every frame |
+| `Scripts/CarPath.cs` | Defines the track centreline as a series of nodes; computes CTE via `GetCrossTrackErr()` |
+| `Scripts/PathManager.cs` | Manages the active path, knows which node the car is near (`iActiveSpan`) |
+| `Scripts/startingLine.cs` | Detects lap completions, measures lap times |
+| `Scripts/Car.cs` | Car physics — applies steering/throttle, tracks velocity, collision |
+| `Scripts/SceneLoader.cs` | Loads/unloads track scenes in response to `load_scene` / `exit_scene` messages |
+| `Scripts/GlobalState.cs` | Flags like `extendedTelemetry` that gate which fields are sent |
+
+### What the sim sends every frame (telemetry JSON)
+
+```json
+{
+  "msg_type":      "telemetry",
+  "steering_angle": 0.0,
+  "throttle":       0.4,
+  "image":          "<base64 camera image>",
+  "hit":            "none",
+  "time":           12.34,
+  "speed":          2.5,
+  "accel_x/y/z":   ...,
+  "gyro_x/y/z":    ...,
+  "pitch/yaw/roll": ...,
+  "activeNode":     42,      ← current path node index (ALWAYS sent)
+  "totalNodes":     186,     ← total path nodes    (ALWAYS sent)
+  "cte":            0.3,     ← cross-track error   (extendedTelemetry=true)
+  "pos_x/y/z":     ...,     ← world position       (extendedTelemetry=true)
+  "vel_x/y/z":     ...      ← world velocity        (extendedTelemetry=true)
+}
+```
+
+### What the sim receives (commands)
+
+```json
+{ "msg_type": "control", "steering": 0.2, "throttle": 0.5, "brake": 0.0 }
+{ "msg_type": "load_scene", "scene_name": "generated_track" }
+{ "msg_type": "exit_scene" }
+{ "msg_type": "car_config", ... }
+```
+
+---
+
+## Layer 2: TCP Socket (localhost:9091)
+
+A plain TCP connection carrying newline-delimited JSON messages.
+The sim is the **server** (listens on 9091).
+Python is the **client** (connects to 9091).
+
+**Critical rule:** Each `gym.make()` call opens ONE TCP connection, which
+spawns ONE car in the sim. Opening a second connection spawns a phantom
+second car. Always `env.close()` before opening a new connection.
+Track switching must go through the EXISTING connection via `exit_scene`,
+not by opening a new connection.
+
+---
+
+## Layer 3: gym_donkeycar (Python Package)
+
+**Location:** `/home/paulh/.local/lib/python3.10/site-packages/gym_donkeycar/`  
+**Installed via:** pip  
+**What it does:** Wraps the TCP connection as a standard Gymnasium environment
+so Stable-Baselines3 and other RL libraries can use it.
+
+### File structure
+
+```
+gym_donkeycar/
+├── __init__.py              Registers all environments with gymnasium
+├── core/
+│   ├── sim_client.py        SDClient — raw TCP socket send/receive
+│   ├── client.py            Low-level socket, threading, message queue
+│   └── message.py           IMesgHandler interface
+└── envs/
+    ├── donkey_env.py        DonkeyEnv — THE gymnasium.Env subclass
+    ├── donkey_sim.py        DonkeyUnitySimContoller — parses telemetry,
+    │                         builds info dict, manages episode state
+    └── donkey_proc.py       Optional: launches sim as subprocess
+```
+
+### How they connect
+
+```
+DonkeyEnv (donkey_env.py)
+    └── creates DonkeyUnitySimContoller (donkey_sim.py)
+            └── creates SimClient (core/sim_client.py)
+                    └── creates SDClient (core/client.py)
+                            └── TCP socket → Unity sim
+```
+
+### donkey_env.py — the Gymnasium interface
+
+This is what your code calls with `gym.make('donkey-generated-track-v0')`.
+
+- `reset()` → sends `car_config`, waits for `sim started!`, returns first obs
+- `step(action)` → sends `control` message (steering + throttle), waits for
+  next telemetry frame, returns `(obs, reward, terminated, truncated, info)`
+- Observation = camera image (120×160×3 uint8)
+- Action space = Box([-1,0], [1,1]) — [steering, throttle]
+
+### donkey_sim.py — the telemetry parser
+
+Receives JSON frames from the sim and maintains state:
+
+| Attribute | Source | Meaning |
+|-----------|--------|---------|
+| `self.image_array` | `image` field | Current camera frame |
+| `self.cte` | `cte` field | Cross-track error (metres from centreline) |
+| `self.speed` | `speed` field | Car speed (m/s) |
+| `self.hit` | `hit` field | What was last hit (`"none"` or object name) |
+| `self.x/y/z` | `pos_x/y/z` | World position |
+| `self.lap_count` | crossing start line | Completed laps |
+| `self.last_lap_time` | crossing start line | Most recent lap time (seconds) |
+| `self.active_node` | `activeNode` | Current path node index ← **newly added** |
+| `self.total_nodes` | `totalNodes` | Total path nodes ← **newly added** |
+
+The info dict returned from `step()` contains all of the above plus:
+- `track_progress = active_node / total_nodes` ← **newly added, 0.0→1.0**
+
+Episode termination (`done=True`) fires when:
+- `abs(cte) > max_cte` (default 8m) — car too far off centreline
+- `hit != "none"` — car hit something (when detected by physics)
+
+### Registered environments
+
+```python
+# All defined in gym_donkeycar/__init__.py
+'donkey-generated-roads-v0'      → GeneratedRoadsEnv    (generated_road)
+'donkey-generated-track-v0'      → GeneratedTrackEnv    (generated_track)
+'donkey-mountain-track-v0'       → MountainTrackEnv     (mountain_track)
+'donkey-minimonaco-track-v0'     → MiniMonacoEnv        (mini_monaco)
+'donkey-warehouse-v0'            → WarehouseEnv
+'donkey-roboracingleague-track-v0' → RoboRacingLeagueTrackEnv
+# ... etc
+```
+
+---
+
+## Layer 4: Our Training Code
+
+**Location:** `agent/`
+
+### reward_wrapper.py — SpeedRewardWrapper
+
+Wraps a DonkeyEnv and **completely replaces** the sim's own reward signal.
+
+**v5 reward (current):**
+```python
+reward = (speed / 10.0) × (1 - |cte| / max_cte)
+```
+- Fast + centred = high reward
+- Slow (e.g. on a hill) = low reward → gradient pushes toward more throttle
+- Off-track = near-zero reward
+- Crash (done=True) = -1.0
+- Short-lap exploit (<5s): large penalty
+
+### multitrack_runner.py — Training Loop
+
+Manages round-robin training across multiple tracks:
+1. Creates env on track A, trains for `steps_per_switch` steps
+2. Calls `close_and_switch()` → sends `exit_scene` via existing viewer,
+   closes env, waits, opens env on track B
+3. Repeats until `total_timesteps` reached
+4. Evaluates on test tracks (mini_monaco, etc.)
+
+**Wrapper stack applied to every env:**
+```
+gym.make(track_id)                    ← raw DonkeyEnv
+  → ThrottleClampWrapper              ← ensures minimum throttle (0.2 or 0.5)
+    → StuckTerminationWrapper         ← ends episode if <0.5m in 80 steps
+      → SpeedRewardWrapper            ← replaces reward with v5 formula
+        → DummyVecEnv                 ← SB3 requires vectorised envs
+          → VecTransposeImage         ← SB3 CNN needs (C,H,W) not (H,W,C)
+```
+
+### Key design decisions
+
+- **PPO with CnnPolicy** — raw image input, SB3 handles CNN feature extraction
+- **Continuous actions** — steering [-1,1] and throttle [0,1]; no discretisation
+- **No warm-start** — each trial trains from random weights to avoid bias
+- **Per-segment checkpointing** — model saved after every training segment
+  so timeouts don't lose all progress
+
+---
+
+## Layer 5: Autoresearch (GP+UCB)
+
+**wave4_controller.py** — outer loop:
+1. Proposes hyperparameters (learning_rate, steps_per_switch, total_timesteps)
+   using Gaussian Process + Upper Confidence Bound (GP+UCB)
+2. Launches `multitrack_runner.py` as a subprocess
+3. Parses test track scores from stdout
+4. Updates GP with (hyperparams → score) to improve next proposal
+5. Saves champion model when score improves
+
+**TinyGP** — pure numpy Gaussian Process (no sklearn dependency):
+- Fits a smooth surface over (hyperparams → performance) space
+- UCB = mean + κ×std — balances exploiting known-good regions vs exploring uncertain ones
+
+---
+
+## Data Flow: One Training Step
+
+```
+1. model.predict(obs) → action [steering, throttle]
+2. ThrottleClampWrapper.step(action) → clamp throttle ≥ 0.2
+3. StuckTerminationWrapper.step(action) → check if car moved <0.5m in 80 steps
+4. SpeedRewardWrapper.step(action) → compute v5 reward, check short-lap exploit
+5. DonkeyEnv.step(action) → send TCP "control" message to Unity sim
+6. Unity sim → physics tick → send telemetry JSON back
+7. donkey_sim.py → parse JSON → update cte, speed, active_node, track_progress
+8. DonkeyEnv.step() returns (obs=camera_image, reward=sim_reward, done, info)
+9. SpeedRewardWrapper replaces sim_reward with v5 reward
+10. SB3 PPO stores (obs, action, v5_reward, done) in rollout buffer
+11. After n_steps=2048: PPO gradient update → policy weights update
+```
+
+---
+
+## What track_progress Tells Us (New)
+
+`info['track_progress']` = `activeNode / totalNodes`
+
+- **0.0** = car is at the start line
+- **0.5** = car is halfway around the track  
+- **1.0** = car has completed the track
+
+This is the **first time we have forward progress information** in the reward.
+Previously, CTE only told us "how far sideways from the centreline" — not
+"how far along the track." With track_progress we can reward the model for
+getting further around the track even if it's slow or slightly off-centre.
+This is especially important for mountain_track where the hill blocked learning.
+
+---
+
+## File Quick Reference
+
+| File | What to edit when... |
+|------|---------------------|
+| `agent/reward_wrapper.py` | Changing reward function |
+| `agent/multitrack_runner.py` | Changing training loop, wrappers, track switching |
+| `agent/wave4_controller.py` | Changing GP search, hyperparameter ranges |
+| `gym_donkeycar/envs/donkey_sim.py` | Adding new fields from sim telemetry |
+| `gym_donkeycar/envs/donkey_env.py` | Changing env reset/step behaviour |
+| `sdsandbox/.../TcpCarHandler.cs` | Adding new telemetry fields from Unity |
+| `sdsandbox/.../CarPath.cs` | Changing how CTE / track progress is computed |
+