fix: multitrack_runner must use VecTransposeImage(DummyVecEnv) not plain wrap_env

The short-lap episode termination fix in SpeedRewardWrapper was not
working when multitrack_runner.py ran via command line because the env
was created as a plain gym.Wrapper chain, not VecTransposeImage(DummyVecEnv).

In custom scripts (Exp8, Exp9), env was explicitly:
  VecTransposeImage(DummyVecEnv([make_env]))
This made episode termination work correctly.

In multitrack_runner.py, env was just wrap_env(raw) — a plain gym.Wrapper.
SB3 auto-wraps this internally but the terminated signal from
SpeedRewardWrapper.force_terminate did not propagate correctly,
so circle-exploit episodes were never terminated during training.

Fix: use VecTransposeImage(DummyVecEnv([...])) explicitly in main().

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
This commit is contained in:
Paul Huliganga 2026-04-18 18:33:40 -04:00
parent fecba1dd35
commit de7b9bc302
4 changed files with 43 additions and 1 deletions

View File

@ -350,3 +350,26 @@ not the best.
**Implementation:** See `train_multitrack()` in multitrack_runner.py — the **Implementation:** See `train_multitrack()` in multitrack_runner.py — the
`best_segment_reward` tracking and `best_model.zip` save logic added 2026-04-17. `best_segment_reward` tracking and `best_model.zip` save logic added 2026-04-17.
## ADR-018: StuckTerminationWrapper is the correct collision fix — NOT OnCollisionStay
**Date:** 2026-04-18
**Status:** Active
**Decision:** Do NOT add OnCollisionStay to the Unity simulator.
Use StuckTerminationWrapper (displacement < 0.5m over N steps terminate).
**Why OnCollisionStay is wrong:**
The car legitimately rubs against barriers while cornering — this should
be allowed to continue. OnCollisionStay would fire on BOTH rubbing AND
stuck scenarios, terminating valid driving attempts.
**Why StuckTerminationWrapper is right:**
- Rubbing + still moving forward: displacement > 0.5m in 80 steps → continues ✅
- Stuck perpendicular, wheels spinning: displacement < 0.5m in 80 steps terminates
The distinction between "rubbing" and "stuck" is made by checking
positional progress, not collision contact. This is the correct signal.
**Tuning note:** stuck_steps=80 (~5 seconds at 16 steps/sec). Could be
reduced to 40 (~2.5 seconds) if stuck periods are observably long.

View File

@ -60,6 +60,7 @@ from stable_baselines3 import PPO
from stable_baselines3.common.utils import get_schedule_fn from stable_baselines3.common.utils import get_schedule_fn
from stable_baselines3.common.evaluation import evaluate_policy from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.callbacks import BaseCallback from stable_baselines3.common.callbacks import BaseCallback
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage
# ---- Project paths ---- # ---- Project paths ----
AGENT_DIR = os.path.dirname(os.path.abspath(__file__)) AGENT_DIR = os.path.dirname(os.path.abspath(__file__))
@ -567,7 +568,7 @@ def main():
env = None env = None
try: try:
raw_env = gym.make(first_env_id) raw_env = gym.make(first_env_id)
env = wrap_env(raw_env) env = VecTransposeImage(DummyVecEnv([lambda: wrap_env(gym.make(first_env_id))]))
log(f'[W3 Runner] ✅ Connected to {first_env_id}') log(f'[W3 Runner] ✅ Connected to {first_env_id}')
except Exception as e: except Exception as e:
log(f'[W3 Runner] ❌ Failed to connect to first training track: {e}') log(f'[W3 Runner] ❌ Failed to connect to first training track: {e}')

View File

@ -788,3 +788,16 @@
[2026-04-18 10:41:59] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90} [2026-04-18 10:41:59] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90}
[2026-04-18 10:41:59] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8} [2026-04-18 10:41:59] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8}
[2026-04-18 10:41:59] [AutoResearch] Only 1 results — using random proposal. [2026-04-18 10:41:59] [AutoResearch] Only 1 results — using random proposal.
[2026-04-18 18:33:04] [AutoResearch] GP UCB top-5 candidates:
[2026-04-18 18:33:04] UCB=2.3107 mu=0.3981 sigma=0.9563 params={'n_steer': 9, 'n_throttle': 2, 'learning_rate': 0.001405531880392808, 'timesteps': 26173}
[2026-04-18 18:33:04] UCB=2.3049 mu=0.8602 sigma=0.7224 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.001793493447174312, 'timesteps': 19198}
[2026-04-18 18:33:04] UCB=2.2813 mu=0.4904 sigma=0.8954 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011616192816742616, 'timesteps': 13887}
[2026-04-18 18:33:04] UCB=2.2767 mu=0.5194 sigma=0.8787 params={'n_steer': 9, 'n_throttle': 4, 'learning_rate': 0.0011646447444663046, 'timesteps': 21199}
[2026-04-18 18:33:04] UCB=2.2525 mu=0.6254 sigma=0.8136 params={'n_steer': 9, 'n_throttle': 3, 'learning_rate': 0.0010196345864901517, 'timesteps': 22035}
[2026-04-18 18:33:04] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=50.0000 params={'n_steer': 5}
[2026-04-18 18:33:04] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'n_steer': 7}
[2026-04-18 18:33:04] [Champion] 🏆 NEW BEST! Trial 0: mean_reward=50.0000 params={'r': 50}
[2026-04-18 18:33:04] [Champion] 🏆 NEW BEST! Trial 1: mean_reward=80.0000 params={'r': 80}
[2026-04-18 18:33:04] [Champion] 🏆 NEW BEST! Trial 3: mean_reward=90.0000 params={'r': 90}
[2026-04-18 18:33:04] [Champion] 🏆 NEW BEST! Trial 5: mean_reward=75.0000 params={'n_steer': 8}
[2026-04-18 18:33:04] [AutoResearch] Only 1 results — using random proposal.

View File

@ -415,3 +415,8 @@
[2026-04-18 10:42:10] [Wave3] Only 0 results — using random proposal. [2026-04-18 10:42:10] [Wave3] Only 0 results — using random proposal.
[2026-04-18 10:42:10] [Champion] 🏆 NEW BEST! Trial 3: score=1500.00 (mini_monaco=1500.0) params={'learning_rate': 0.0002, 'steps_per_switch': 8000, 'total_timesteps': 150000} [2026-04-18 10:42:10] [Champion] 🏆 NEW BEST! Trial 3: score=1500.00 (mini_monaco=1500.0) params={'learning_rate': 0.0002, 'steps_per_switch': 8000, 'total_timesteps': 150000}
[2026-04-18 10:42:10] [Champion] 🏆 NEW BEST! Trial 1: score=2000.00 (mini_monaco=2000.0) params={} [2026-04-18 10:42:10] [Champion] 🏆 NEW BEST! Trial 1: score=2000.00 (mini_monaco=2000.0) params={}
[2026-04-18 18:33:18] [Wave3] Seed trial 1/2: using hardcoded params.
[2026-04-18 18:33:18] [Wave3] Seed trial 2/2: using hardcoded params.
[2026-04-18 18:33:18] [Wave3] Only 0 results — using random proposal.
[2026-04-18 18:33:18] [Champion] 🏆 NEW BEST! Trial 3: score=1500.00 (mini_monaco=1500.0) params={'learning_rate': 0.0002, 'steps_per_switch': 8000, 'total_timesteps': 150000}
[2026-04-18 18:33:18] [Champion] 🏆 NEW BEST! Trial 1: score=2000.00 (mini_monaco=2000.0) params={}