donkeycar-rl-autoresearch

Commit Graph

Author	SHA1	Message	Date
Paul Huliganga	a5577fb3e7	feat: shuttle-exploit detection in mini_monaco eval Samples car position every 100 steps during eval. Computes macro efficiency = net_displacement / total_sampled_path. If < 0.3 with >= 500 steps, logs WARNING: SHUTTLE EXPLOIT? with the efficiency value. Also logs reward/step per episode so anomalously high-scoring long episodes can be diagnosed immediately. This will tell us definitively whether Trials 9 and 14 (1435/1573 scores, 2000 steps each) were genuine driving or back-and-forth shuttling on a mini_monaco straight. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-16 17:29:30 -04:00
Paul Huliganga	96c49dd057	wave3: autoresearch trial 20 results Agent: pi Tests: N/A Tests-Added: 0 TypeScript: N/A	2026-04-16 14:10:06 -04:00
Paul Huliganga	45b057e9c1	wave3: autoresearch trial 15 results Agent: pi Tests: N/A Tests-Added: 0 TypeScript: N/A	2026-04-16 08:43:17 -04:00
Paul Huliganga	0505de7e63	wave3: autoresearch trial 10 results Agent: pi Tests: N/A Tests-Added: 0 TypeScript: N/A	2026-04-16 03:31:41 -04:00
Paul Huliganga	b00f63dfbc	fix: save_dir not in scope inside train_multitrack — crashed every trial Checkpoint code added save_dir inside train_multitrack() but save_dir is defined in main(). Every trial since the checkpoint fix was added crashed with 'name save_dir is not defined' after the first segment, producing rc=101 and no GP data. Fix: add save_dir=None parameter to train_multitrack() and pass it from the main() call site. This explains why Trials 6-10 in the current run all produced None results despite appearing to train normally for the first segment. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-15 22:47:29 -04:00
Paul Huliganga	ff8bdd8b8a	docs: ADR-013 through ADR-016 — decisions that were lost to context compaction ADR-013: Wave 4 train-from-scratch rationale (why no warm-start, why generated_track+mountain_track, proven by 1943 overnight result) ADR-014: Measure throughput before long runs (10+ hours lost to timeouts) ADR-015: Per-segment checkpointing is non-negotiable ADR-016: Verify fixes are running before walking away These decisions existed in conversation but were never written down, causing them to be forgotten after context compaction and re-learned the hard way multiple times. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-15 22:34:48 -04:00
Paul Huliganga	a9eed2faa3	fix: restart with verified config + seed GP with overnight 1943 result All previous issues: - Controller was never restarted after cap/checkpoint fixes -> they never ran - Timeout trials (score=0) were polluting GP data -> removed - Overnight Trial 3 result (1943 mini_monaco) was unknown to GP -> added GP now has 5 valid data points including the 1943 score at lr=0.000685, switch=17499. GP should converge toward longer switching intervals which produced the only great result. Verified before relaunch: - PARAM_SPACE max total_timesteps = 90000 ✓ - Checkpoint saves after every segment ✓ - Rescue eval on timeout ✓ - 102 tests passing ✓ Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-15 22:26:53 -04:00
Paul Huliganga	e61ebc5b38	fix: prevent trial timeouts losing all data Two changes: 1. Lower total_timesteps cap: 120k → 90k Actual throughput is 16 steps/sec (not 20 as estimated). 120k steps = 126 min training + 9 min overhead = 135 min > 2hr limit. 90k steps = 94 min + 8 min overhead = 102 min, safely within limit. 2. Per-segment checkpoint saves in multitrack_runner model.save() called after every segment so the latest weights are always on disk. If the runner is killed (timeout/crash/Ctrl+C), training data is never completely lost. 3. Timeout rescue eval in wave4_controller If JOB_TIMEOUT fires and a checkpoint exists, immediately runs a quick mini_monaco eval on the checkpoint so the trial still produces a GP data point despite the timeout. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-15 21:54:50 -04:00
Paul Huliganga	5714a96bfb	wave3: autoresearch trial 5 results Agent: pi Tests: N/A Tests-Added: 0 TypeScript: N/A	2026-04-15 17:08:50 -04:00
Paul Huliganga	c10e56d894	fix: cap total_timesteps at 120k to prevent 2hr timeout Trials 3+4 both proposed ~140k steps and hit the 2hr JOB_TIMEOUT, wasting time and producing no GP data. At ~20 steps/sec, 120k steps takes ~100 min, safely within the 2hr limit. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-15 16:30:07 -04:00
Paul Huliganga	f9f6a09744	fix: StuckTerminationWrapper + deque import + 102 tests StuckTerminationWrapper added to wrap_env stack (between ThrottleClamp and SpeedReward): - Terminates episode after stuck_steps=80 steps with <0.5m displacement - Handles slow barrier contact that Unity hit detection misses - Handles off-lap-line circles (efficiency→0 gave zero reward but no termination; now gives -1.0 after 80 steps = ~4s of non-progress) - Wrapper stack: ThrottleClamp → StuckTermination → SpeedReward Also: missing deque import in multitrack_runner.py caused NameError. Phase 4 results cleared again (Trial 1 ran without StuckTermination). Tests: 2 new stuck-termination tests, 102 total. Agent: pi Tests: 102 passed Tests-Added: 2 TypeScript: N/A	2026-04-15 09:17:27 -04:00
Paul Huliganga	5d1227833d	fix: close short-lap circle exploit and cap segment eval episode length Two reward hacking behaviours observed during Wave 4 training: 1. Short-lap circle exploit (reported by user, echoes Toni's guardrail hack): Model circles at start/finish line completing laps in 1-2 sim-seconds, accumulating lap_count indefinitely with no genuine track progress. Fix: SpeedRewardWrapper detects lap_count increment; if last_lap_time < min_lap_time (5.0s), returns penalty = -10 × (min_lap_time / lap_time). A 1-second lap gives -50 penalty. Legitimate 12-second laps unaffected. Window size also increased from 30 → 60 to catch slower circles. 2. Non-terminating segment eval episodes: evaluate_policy on wide tracks (no barriers) could run indefinitely, inflating segment_reward to 200k+. Replaced with manual eval loop capped at MAX_EVAL_STEPS=3000 steps. Phase 4 results cleared (trials 4-6 ran with exploitable reward). Tests: 4 new reward wrapper tests, 100 total passing. Agent: pi Tests: 100 passed Tests-Added: 4 TypeScript: N/A	2026-04-15 09:06:25 -04:00
Paul Huliganga	1be95b7c82	wave3: autoresearch trial 5 results Agent: pi Tests: N/A Tests-Added: 0 TypeScript: N/A	2026-04-15 07:15:57 -04:00
Paul Huliganga	860e3d6610	fix: fresh PPO verbose=0 suppressed all training output — set verbose=1 Without this, Wave 4 scratch-trained models produce no rollout stats in the log, making it impossible to monitor training progress or spot degenerate policies early. Warm-start models in Wave 3 showed stats because verbose=1 was baked into the Phase-2 saved model state; fresh models default to verbose=0. Agent: pi Tests: 96 passed Tests-Added: 0 TypeScript: N/A	2026-04-14 22:44:22 -04:00

14 Commits