donkeycar-rl-autoresearch

Commit Graph

Author	SHA1	Message	Date
Paul Huliganga	84061c01b2	feat: add cross-track warm-start experiments for mountain->generated and generated->mountain	2026-04-20 16:34:24 -04:00
Paul Huliganga	0da04327ef	docs: capture robust mountain finetune winner at 36k and preserve eval comparison	2026-04-20 00:43:27 -04:00
Paul Huliganga	2b90de2fba	fix: import json, use make_env_base in phase switch, and run eval sequentially to avoid second concurrent sim car	2026-04-19 20:37:25 -04:00
Paul Huliganga	f3c89116ee	fix: exp14 finetune eval uses make_env_base (runtime throttle floor) instead of removed make_env	2026-04-19 20:30:51 -04:00
Paul Huliganga	6c5623e881	fix: exp14 finetune load warm-start model without temp env to prevent second spawned car	2026-04-19 20:24:33 -04:00
Paul Huliganga	0c3a37f877	fix: close temporary loaded_env after loading warm-start model to avoid leaving extra TCP vehicle	2026-04-19 20:17:29 -04:00
Paul Huliganga	38dd5e9b1d	fix: ensure lr_schedule callable set when loading warm-start model (use get_schedule_fn) and update optimizer LR	2026-04-19 20:14:35 -04:00
Paul Huliganga	eb92d119f9	fix: keep action-space matching by loading model with base throttle 0.2 and applying runtime throttle_floor wrapper for phase1	2026-04-19 20:10:19 -04:00
Paul Huliganga	41d12dede2	fix: load warm-start with original action space (throttle_min=0.2), then switch env for phase1 throttle	2026-04-19 20:09:08 -04:00
Paul Huliganga	bc23a316e0	exp14 finetune: warm-start mountain champion, throttle schedule 0.4->0.2, LR=2e-4, checkpoints and evals	2026-04-19 20:08:14 -04:00
Paul Huliganga	b1ec14e3cb	fix: exp14 — proper track switch via exit_scene before connecting to mountain_track	2026-04-19 19:18:33 -04:00
Paul Huliganga	1405a88699	feat: Exp 14 — mountain_track, v5 reward, lap-based stopping v5 required for mountain hills (v4 gives zero gradient on hills - documented Exp 1). Same simple approach as Exp 13 which worked: single track, minimal wrappers, lap-based stopping. ThrottleClamp + V5Reward only.	2026-04-19 19:15:00 -04:00
Paul Huliganga	5a1693b4ec	feat: Exp 13 — generated_track, v4 reward, back to basics (no extra heuristics) Return to Wave 4 setup that produced Trial 9 (2000/2000 on generated_track). v4 reward: base x efficiency x speed. Circles give ~0 reward naturally. No StuckTerminationWrapper, no CTE patience, no progress terminator. Just ThrottleClamp + V4Reward. Lap-based stopping criterion.	2026-04-19 17:33:17 -04:00
Paul Huliganga	9ffe1c5d40	fix: efficiency gate now TERMINATES after 20 low-efficiency steps (was zero-reward only) Previously circles ran 20+ seconds because the efficiency gate only returned 0 reward without terminating. After 20 consecutive steps of efficiency < 0.15 (~0.7 seconds at 27 steps/sec), episode now terminates with -1.0. Also confirmed from telemetry diagnostic: CTE does report correctly when car goes off-track (rises steadily to 6.2m before tree collision). The grass exploit runs long only when the open grass area has no obstacles. Efficiency gate termination is the most reliable catch for both circles and open-grass driving (straight-line grass = high efficiency, but active_node progress terminator catches that case).	2026-04-19 17:26:38 -04:00
Paul Huliganga	813f888502	fix: reward v6.1 — active_node progress terminator kills circle/stuck exploits User's insight: a circling car stays near the same track waypoints, so active_node (sim's track progress indicator) never advances. Track the maximum active_node reached this episode. If it hasn't increased in progress_patience=60 steps (~3.3s), terminate. This catches: - Circular driving (active_node oscillates, max never increases) - Stuck on cone/barrier (active_node frozen) - NOT triggered by: legitimate cornering, slow forward progress, lap resets On lap completion, active_node wraps to 0 — reset max_node_seen and counter. Also: Exp 12 — single track mountain training with lap-based stopping criterion. Train until 3 consecutive laps in eval, not fixed step count.	2026-04-19 17:01:41 -04:00
Paul Huliganga	8b84409e58	fix: StuckTerminationWrapper — wall-clock timeout (12s) prevents 1min+ stuck episodes When both DummyVecEnv cars get stuck against walls simultaneously, Unity physics slows to 1-2 FPS (heavy collision computation). At that speed, stuck_steps=40 takes 1+ minute of wall-clock time — observed twice by user. Fix: add max_stuck_seconds=12.0 wall-clock timeout. Timer resets whenever car moves >= min_displacement. Fires regardless of step count if car hasn't moved in 12 real-world seconds. Both triggers preserved (step count OR time).	2026-04-19 16:30:50 -04:00
Paul Huliganga	dc563e2b6c	fix: exp11d remove progress_patience — grass fix only per ADR-020	2026-04-19 16:18:17 -04:00
Paul Huliganga	e95c33c1bf	fix: reward v6.1 — grass exploit only (CTE patience terminator) Removed the progress_patience (active_node) terminator that was added without sufficient evidence. Per ADR-020, mountain rollback is a learning issue not a termination issue. Removed code should not be re-added without specific evidence it is needed. Only confirmed fix: CTE patience terminator catches grass exploit BEFORE CTE exceeds 16m (the sim's determine_episode_over pass threshold). - max_cte_terminate=4.0m - cte_patience=20 steps	2026-04-19 16:15:39 -04:00
Paul Huliganga	f730a2e0ba	docs: ADR-020/021 + session log — throttle/hill history and grass exploit root cause Critical facts documented permanently: - throttle_min=0.5 bakes into action space (too fast for corners) - throttle_min=0.2 + v5 reward CAN learn hill (proved Exp 9, mountain only 90k) - Mountain failure in parallel is contamination from grass exploit, not throttle - Grass exploit root cause: sim determine_episode_over() passes when CTE>16m - DO NOT confuse mountain rollback with stuck issue - DO NOT change throttle_min as first response to mountain failure	2026-04-19 16:14:28 -04:00
Paul Huliganga	16bd379e95	feat: Exp 11c — parallel DummyVecEnv + v6 reward, extended to 250k steps	2026-04-19 13:27:38 -04:00
Paul Huliganga	0993d4f1e7	docs: Exp 11 + 11b results — parallel envs work, v6 prevents circles, but plateaus at ~194 steps Exp 11 (v5 reward): aborted at 66k — circular driving returned without efficiency term Exp 11b (v6 reward): completed 90k — no circles but plateaus at 170-195 steps All 4 tracks eval: remarkably consistent ~194 steps (including zero-shot) Parallel DummyVecEnv infrastructure proven stable. Next: increase training budget (90k may be insufficient for 2 parallel envs).	2026-04-19 13:26:29 -04:00
Paul Huliganga	91ce8fc1fa	feat: Exp 11b — parallel DummyVecEnv + v6 reward (anti-circle gate) + built-in eval	2026-04-19 12:03:46 -04:00
Paul Huliganga	beb04f3ebe	fix: reward v6 — efficiency gate prevents circular driving, stuck_steps 80→40 v5 dropped the efficiency term to get gradient signal on hills, but this re-enabled circular driving (observed in Exp 11). v6 adds efficiency back as a GATE (not multiplier): if efficiency < 0.15, reward = 0. Otherwise reward = speed × CTE_quality (same as v5). Gate vs multiplier: v4 used efficiency as a multiplier which killed gradient on hills (all terms → 0 simultaneously). v6's gate passes when efficiency is above threshold (car moving forward, even slowly on hill) and only blocks when car is truly circling. Also reduced stuck_steps from 80 to 40 (~2.5s vs ~5s) — user reported car stuck against barriers for ~10s which is too long with DummyVecEnv.	2026-04-19 12:02:55 -04:00
Paul Huliganga	21addf268e	feat: Exp 11 — parallel DummyVecEnv multi-track training (two sim instances)	2026-04-19 11:05:22 -04:00
Paul Huliganga	86357622e3	docs: session log + ADR-019 — parallel DummyVecEnv for multi-track training	2026-04-19 10:50:11 -04:00
Paul Huliganga	db1274174f	docs: Exp10 vs Exp9 vs Wave4 Trial 9 root cause analysis — random seed lottery	2026-04-19 10:29:16 -04:00
Paul Huliganga	3d04b53a86	docs: Exp10 eval results — total failure, crashes on all tracks (massive regression from Exp9/W4T9)	2026-04-19 10:19:16 -04:00
Paul Huliganga	6e9546cd22	save: all experiment scripts moved from /tmp to agent/experiments/ Scripts in /tmp are lost on reboot and not reproducible. All experiment scripts now committed to git with README. Exp5 script was already gone (lost before this fix). All others (Exp6-Exp10, overnight, wave5, etc.) now preserved. Rule going forward: scripts saved to agent/experiments/ and committed BEFORE running, not after. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-18 21:30:08 -04:00
Paul Huliganga	de7b9bc302	fix: multitrack_runner must use VecTransposeImage(DummyVecEnv) not plain wrap_env The short-lap episode termination fix in SpeedRewardWrapper was not working when multitrack_runner.py ran via command line because the env was created as a plain gym.Wrapper chain, not VecTransposeImage(DummyVecEnv). In custom scripts (Exp8, Exp9), env was explicitly: VecTransposeImage(DummyVecEnv([make_env])) This made episode termination work correctly. In multitrack_runner.py, env was just wrap_env(raw) — a plain gym.Wrapper. SB3 auto-wraps this internally but the terminated signal from SpeedRewardWrapper.force_terminate did not propagate correctly, so circle-exploit episodes were never terminated during training. Fix: use VecTransposeImage(DummyVecEnv([...])) explicitly in main(). Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-18 18:33:40 -04:00
Paul Huliganga	fecba1dd35	docs: TEST_HISTORY Exp10 plan added Exp10: generated_track + mountain_track, v5 reward, throttle_min=0.2 Same as Exp9 but with visual diversity from second track. Agent: pi	2026-04-18 17:59:07 -04:00
Paul Huliganga	b19dcc8b80	feat: run_eval.py — standard eval runner with persistent logging Every test run now saves to agent/test-results/YYYY-MM-DD_HH-MM_<model>.log so results are never lost. Also added 3-set Exp9 eval results to TEST_HISTORY. Usage: python3 agent/run_eval.py --model models/exp9-.../best_model.zip --sets 3 Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-18 15:32:36 -04:00
Paul Huliganga	eb4fd39056	docs: TEST_HISTORY updated with Exp8 results and Exp9 plan Exp8 results: 567 reward peak at step 60k, policy diverged after. Best_model correctly saved. mini_monaco crashed at 91 steps (mean) at same corner every time — throttle min=0.5 baked into action space. Exp9 plan: throttle_min=0.2, v5 reward unchanged. Tests hypothesis that v5 gradient is sufficient for hill without forced 0.5 minimum. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-18 13:40:45 -04:00
Paul Huliganga	041481916d	docs: TEST_HISTORY.md — comprehensive record of all experiments Every mountain track experiment (Exp1-8) and Wave 4 trials documented: - What was changed from previous test - Key observation from simulator - Root cause of failure - What was learned Also documents: what we keep, open problems, next steps. Exp 8 currently running (PID 2941877). Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-18 11:18:53 -04:00
Paul Huliganga	47d8e5b346	fix: short-lap exploit now TERMINATES the episode, not just penalises The circle exploit persisted because the penalty alone (-100 per short lap) was insufficient. The model stayed alive between laps accumulating small positive rewards, making circling a viable strategy despite the penalty. Fix: _compute_reward_and_done() returns (reward, force_terminate). When a short lap is detected, force_terminate=True is returned and step() sets terminated=True immediately. The episode ends on the spot — no more rewards possible. This makes the circle exploit strictly worse than any forward driving behaviour. Tests updated: _compute_reward → _compute_reward_and_done, short-lap test now asserts force_terminate=True. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-18 10:42:23 -04:00
Paul Huliganga	10719b4ff6	fix: save numbered checkpoint every segment, never overwrite Every training segment now saves checkpoint_NNNNNNN.zip so the full training history is preserved on disk. No checkpoint is ever overwritten. model.zip still updated for crash recovery. After a 90k-step run with 13 segments you now have: checkpoint_0006851.zip <- step 6,851 checkpoint_0013702.zip <- step 13,702 ... checkpoint_0090000.zip <- step 90,000 best_model.zip <- highest scoring segment (reloaded at end) model.zip <- latest weights (crash recovery) This means we can NEVER again lose a good mid-training model. If the model was driving at step 30k, checkpoint_0030000.zip exists. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-17 22:10:37 -04:00
Paul Huliganga	fc01057c14	docs: ADR-017 — always save best model, never just latest Documents the root cause of losing the mountain_track model that was doing 20-second laps at step 30k but crashed at step 90k final eval. Phase 2 (13k steps, simple track): final = best. Assumption carried forward incorrectly into Wave 4 (90k steps, policy can drift). Mandatory rule: every training script uses train_multitrack() best_model tracking OR SB3 EvalCallback. No exceptions. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-17 16:03:59 -04:00
Paul Huliganga	4f77b8a468	fix: always save and return the BEST model, not the last one This was the root cause of losing good models during training. The model could learn to lap at step 30k then drift to a worse policy by step 90k, and we only ever saved the final weights. Changes to train_multitrack(): - Tracks best_segment_reward across all segments - Saves best_model.zip whenever a new high score is achieved - At end of training, RELOADS best_model.zip before returning so the caller always gets the best policy found, not the drift Both files saved per trial: model.zip <- latest checkpoint (crash recovery) best_model.zip <- best policy seen during training (used for eval) Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-17 14:45:37 -04:00
Paul Huliganga	0b5ce6ab7e	docs: ARCHITECTURE.md — complete system architecture guide Explains all 5 layers: 1. sdsandbox (Unity C# simulator) 2. TCP socket (JSON protocol) 3. gym_donkeycar (Python gymnasium wrapper) 4. Our training code (reward_wrapper, multitrack_runner) 5. Autoresearch (GP+UCB controller) Includes data flow, file quick reference, key design decisions, and explanation of the new track_progress field. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-17 14:06:38 -04:00
Paul Huliganga	b8a13dea81	feat: v5 reward — speed × CTE-quality, drop efficiency term Problem with v4 on mountain_track: CTE × efficiency × speed all collapse to zero simultaneously when the car slows on the hill, giving no gradient signal for 'apply more throttle'. v5: reward = (speed / 10) × (1 - \|CTE\| / max_cte) - Directly rewards going fast while staying centred - Hill: car slows → reward drops → clear gradient toward more throttle - Circling protection now entirely handled by lap-time penalty + StuckTerminationWrapper (not by the reward formula) Tests updated to reflect v5 semantics (102 passing). Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-17 13:25:38 -04:00
Paul Huliganga	a6831459dd	docs: STATE.md updated with April 16 test results Key findings: - Trial 9: drives generated_track (3/3) AND mini_monaco zero-shot (40s laps) - Trial 19: drives generated_track (2/3) - Trial 3: corrupted, policy-only recovery still crashes at ~104 steps - Generated_track lighting variation per episode may be key to generalisation - Phase 2 champion: confirmed still drives generated_road perfectly Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-16 20:45:45 -04:00
Paul Huliganga	792b6734f7	docs: STATE.md — full project state as of April 16 end of Wave 4 Documents all 25 trial results, known models, what is confirmed vs unknown, and the 6 pending verification tests agreed with user. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-16 20:17:41 -04:00
Paul Huliganga	619188bf17	wave3: autoresearch trial 25 results Agent: pi Tests: N/A Tests-Added: 0 TypeScript: N/A	2026-04-16 20:01:55 -04:00
Paul Huliganga	c8c17e2e46	wave3: autoresearch trial 25 results Agent: pi Tests: N/A Tests-Added: 0 TypeScript: N/A	2026-04-16 20:01:51 -04:00
Paul Huliganga	a3a49fbcaf	feat: eval_on_track.py — proper zero-shot eval on any track The goal is a model that generalises to ANY road-surface track, not specifically mini_monaco. mini_monaco (tight barriers, hairpins) was a bad proxy for this. Generated_road is a much better zero-shot test: same visual category, never seen during Wave 4 training. eval_on_track.py lets us run the Wave 4 champion on any track with the same wrappers used during training, plus shuttle-exploit detection. Run after Trial 25 finishes: python3 agent/eval_on_track.py --model agent/models/wave4-champion/model.zip --track donkey-generated-roads-v0 --episodes 3 --max-steps 3000 Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-16 19:47:56 -04:00
Paul Huliganga	a5577fb3e7	feat: shuttle-exploit detection in mini_monaco eval Samples car position every 100 steps during eval. Computes macro efficiency = net_displacement / total_sampled_path. If < 0.3 with >= 500 steps, logs WARNING: SHUTTLE EXPLOIT? with the efficiency value. Also logs reward/step per episode so anomalously high-scoring long episodes can be diagnosed immediately. This will tell us definitively whether Trials 9 and 14 (1435/1573 scores, 2000 steps each) were genuine driving or back-and-forth shuttling on a mini_monaco straight. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-16 17:29:30 -04:00
Paul Huliganga	96c49dd057	wave3: autoresearch trial 20 results Agent: pi Tests: N/A Tests-Added: 0 TypeScript: N/A	2026-04-16 14:10:06 -04:00
Paul Huliganga	45b057e9c1	wave3: autoresearch trial 15 results Agent: pi Tests: N/A Tests-Added: 0 TypeScript: N/A	2026-04-16 08:43:17 -04:00
Paul Huliganga	0505de7e63	wave3: autoresearch trial 10 results Agent: pi Tests: N/A Tests-Added: 0 TypeScript: N/A	2026-04-16 03:31:41 -04:00
Paul Huliganga	b00f63dfbc	fix: save_dir not in scope inside train_multitrack — crashed every trial Checkpoint code added save_dir inside train_multitrack() but save_dir is defined in main(). Every trial since the checkpoint fix was added crashed with 'name save_dir is not defined' after the first segment, producing rc=101 and no GP data. Fix: add save_dir=None parameter to train_multitrack() and pass it from the main() call site. This explains why Trials 6-10 in the current run all produced None results despite appearing to train normally for the first segment. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-15 22:47:29 -04:00
Paul Huliganga	ff8bdd8b8a	docs: ADR-013 through ADR-016 — decisions that were lost to context compaction ADR-013: Wave 4 train-from-scratch rationale (why no warm-start, why generated_track+mountain_track, proven by 1943 overnight result) ADR-014: Measure throughput before long runs (10+ hours lost to timeouts) ADR-015: Per-segment checkpointing is non-negotiable ADR-016: Verify fixes are running before walking away These decisions existed in conversation but were never written down, causing them to be forgotten after context compaction and re-learned the hard way multiple times. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-15 22:34:48 -04:00

1 2

100 Commits All Branches Search

100 Commits

All Branches