The short-lap episode termination fix in SpeedRewardWrapper was not
working when multitrack_runner.py ran via command line because the env
was created as a plain gym.Wrapper chain, not VecTransposeImage(DummyVecEnv).
In custom scripts (Exp8, Exp9), env was explicitly:
VecTransposeImage(DummyVecEnv([make_env]))
This made episode termination work correctly.
In multitrack_runner.py, env was just wrap_env(raw) — a plain gym.Wrapper.
SB3 auto-wraps this internally but the terminated signal from
SpeedRewardWrapper.force_terminate did not propagate correctly,
so circle-exploit episodes were never terminated during training.
Fix: use VecTransposeImage(DummyVecEnv([...])) explicitly in main().
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
Documents the root cause of losing the mountain_track model that was
doing 20-second laps at step 30k but crashed at step 90k final eval.
Phase 2 (13k steps, simple track): final = best. Assumption carried
forward incorrectly into Wave 4 (90k steps, policy can drift).
Mandatory rule: every training script uses train_multitrack() best_model
tracking OR SB3 EvalCallback. No exceptions.
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
ADR-013: Wave 4 train-from-scratch rationale (why no warm-start, why
generated_track+mountain_track, proven by 1943 overnight result)
ADR-014: Measure throughput before long runs (10+ hours lost to timeouts)
ADR-015: Per-segment checkpointing is non-negotiable
ADR-016: Verify fixes are running before walking away
These decisions existed in conversation but were never written down,
causing them to be forgotten after context compaction and re-learned
the hard way multiple times.
Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
- Rebuilt donkeycar_sb3_runner.py: real PPO/DQN model.learn() + evaluate_policy() + model.save()
- Added SpeedRewardWrapper: reward = speed * (1 - |cte|/max_cte)
- Added ChampionTracker: tracks best model across all trials, writes manifest.json
- Rebuilt autoresearch_controller.py: Phase 1 results separated from random-policy data
- Added timesteps to GP search space
- Added --push-every N for automatic git push
- Added 37 passing tests: discretize_action, reward_wrapper, autoresearch_controller, runner_integration
- Scaffolded project with agent harness (large mode): PROJECT-SPEC, DECISIONS, IMPLEMENTATION_PLAN, EXECUTION_MASTER
- Fixed: model.save() never called before model is defined (was root cause of all prior NameError crashes)
- Fixed: random policy replaced with real trained policy evaluation
Agent: pi/claude-sonnet
Tests: 37/37 passing
Tests-Added: +37
TypeScript: N/A