donkeycar-rl-autoresearch

Commit Graph

Author	SHA1	Message	Date
Paul Huliganga	7fdfbacaee	fix: exp18 — fix circular exploit in parallel training (window=200, min_lap=12s) Exp 17 post-mortem: efficiency gate window=30 steps only covers ~40% of a 3.5s exploit circle at 22fps, giving partial-arc efficiency ~0.77 (gate fires at 0.15). Car earned positive reward while circling, outweighing the -10 lap penalty. Performance peaked at 80k then collapsed. Exp 18 fixes: - window_size 30→200: covers 2+ full exploit circles, driving efficiency→0 - min_lap_time 5s→12s: genuine laps are 13-16s (gentrack) and 27-29s (mountain); anything under 12s is an exploit and terminates immediately Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 09:00:42 -04:00
Paul Huliganga	b504b89b2a	feat: add exp17 parallel DummyVecEnv 450k training + strategy docs - exp17_parallel_450k.py: parallel two-track training (generated_track:9091, mountain_track:9093), 450k steps, v6 reward, HOST=localhost - DECISIONS.md: ADR-025 (parallel strategy) and ADR-026 (mountain friction fix) - docs/STATE.md: updated to April 2026 state with current champions and strategy - docs/TEST_HISTORY.md: mountain friction fix notes + Exp 17 full design - outerloop-results: exp14 finetune logs and robust mountain eval results Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 02:42:20 -04:00
Paul Huliganga	a8aef52f00	fix: force scene reset before exp15 generated-track warm-start so sim actually loads generated_track	2026-04-20 16:36:00 -04:00
Paul Huliganga	84061c01b2	feat: add cross-track warm-start experiments for mountain->generated and generated->mountain	2026-04-20 16:34:24 -04:00
Paul Huliganga	2b90de2fba	fix: import json, use make_env_base in phase switch, and run eval sequentially to avoid second concurrent sim car	2026-04-19 20:37:25 -04:00
Paul Huliganga	f3c89116ee	fix: exp14 finetune eval uses make_env_base (runtime throttle floor) instead of removed make_env	2026-04-19 20:30:51 -04:00
Paul Huliganga	6c5623e881	fix: exp14 finetune load warm-start model without temp env to prevent second spawned car	2026-04-19 20:24:33 -04:00
Paul Huliganga	0c3a37f877	fix: close temporary loaded_env after loading warm-start model to avoid leaving extra TCP vehicle	2026-04-19 20:17:29 -04:00
Paul Huliganga	38dd5e9b1d	fix: ensure lr_schedule callable set when loading warm-start model (use get_schedule_fn) and update optimizer LR	2026-04-19 20:14:35 -04:00
Paul Huliganga	eb92d119f9	fix: keep action-space matching by loading model with base throttle 0.2 and applying runtime throttle_floor wrapper for phase1	2026-04-19 20:10:19 -04:00
Paul Huliganga	41d12dede2	fix: load warm-start with original action space (throttle_min=0.2), then switch env for phase1 throttle	2026-04-19 20:09:08 -04:00
Paul Huliganga	bc23a316e0	exp14 finetune: warm-start mountain champion, throttle schedule 0.4->0.2, LR=2e-4, checkpoints and evals	2026-04-19 20:08:14 -04:00
Paul Huliganga	b1ec14e3cb	fix: exp14 — proper track switch via exit_scene before connecting to mountain_track	2026-04-19 19:18:33 -04:00
Paul Huliganga	1405a88699	feat: Exp 14 — mountain_track, v5 reward, lap-based stopping v5 required for mountain hills (v4 gives zero gradient on hills - documented Exp 1). Same simple approach as Exp 13 which worked: single track, minimal wrappers, lap-based stopping. ThrottleClamp + V5Reward only.	2026-04-19 19:15:00 -04:00
Paul Huliganga	5a1693b4ec	feat: Exp 13 — generated_track, v4 reward, back to basics (no extra heuristics) Return to Wave 4 setup that produced Trial 9 (2000/2000 on generated_track). v4 reward: base x efficiency x speed. Circles give ~0 reward naturally. No StuckTerminationWrapper, no CTE patience, no progress terminator. Just ThrottleClamp + V4Reward. Lap-based stopping criterion.	2026-04-19 17:33:17 -04:00
Paul Huliganga	813f888502	fix: reward v6.1 — active_node progress terminator kills circle/stuck exploits User's insight: a circling car stays near the same track waypoints, so active_node (sim's track progress indicator) never advances. Track the maximum active_node reached this episode. If it hasn't increased in progress_patience=60 steps (~3.3s), terminate. This catches: - Circular driving (active_node oscillates, max never increases) - Stuck on cone/barrier (active_node frozen) - NOT triggered by: legitimate cornering, slow forward progress, lap resets On lap completion, active_node wraps to 0 — reset max_node_seen and counter. Also: Exp 12 — single track mountain training with lap-based stopping criterion. Train until 3 consecutive laps in eval, not fixed step count.	2026-04-19 17:01:41 -04:00
Paul Huliganga	dc563e2b6c	fix: exp11d remove progress_patience — grass fix only per ADR-020	2026-04-19 16:18:17 -04:00
Paul Huliganga	f730a2e0ba	docs: ADR-020/021 + session log — throttle/hill history and grass exploit root cause Critical facts documented permanently: - throttle_min=0.5 bakes into action space (too fast for corners) - throttle_min=0.2 + v5 reward CAN learn hill (proved Exp 9, mountain only 90k) - Mountain failure in parallel is contamination from grass exploit, not throttle - Grass exploit root cause: sim determine_episode_over() passes when CTE>16m - DO NOT confuse mountain rollback with stuck issue - DO NOT change throttle_min as first response to mountain failure	2026-04-19 16:14:28 -04:00
Paul Huliganga	16bd379e95	feat: Exp 11c — parallel DummyVecEnv + v6 reward, extended to 250k steps	2026-04-19 13:27:38 -04:00
Paul Huliganga	91ce8fc1fa	feat: Exp 11b — parallel DummyVecEnv + v6 reward (anti-circle gate) + built-in eval	2026-04-19 12:03:46 -04:00
Paul Huliganga	beb04f3ebe	fix: reward v6 — efficiency gate prevents circular driving, stuck_steps 80→40 v5 dropped the efficiency term to get gradient signal on hills, but this re-enabled circular driving (observed in Exp 11). v6 adds efficiency back as a GATE (not multiplier): if efficiency < 0.15, reward = 0. Otherwise reward = speed × CTE_quality (same as v5). Gate vs multiplier: v4 used efficiency as a multiplier which killed gradient on hills (all terms → 0 simultaneously). v6's gate passes when efficiency is above threshold (car moving forward, even slowly on hill) and only blocks when car is truly circling. Also reduced stuck_steps from 80 to 40 (~2.5s vs ~5s) — user reported car stuck against barriers for ~10s which is too long with DummyVecEnv.	2026-04-19 12:02:55 -04:00
Paul Huliganga	21addf268e	feat: Exp 11 — parallel DummyVecEnv multi-track training (two sim instances)	2026-04-19 11:05:22 -04:00
Paul Huliganga	6e9546cd22	save: all experiment scripts moved from /tmp to agent/experiments/ Scripts in /tmp are lost on reboot and not reproducible. All experiment scripts now committed to git with README. Exp5 script was already gone (lost before this fix). All others (Exp6-Exp10, overnight, wave5, etc.) now preserved. Rule going forward: scripts saved to agent/experiments/ and committed BEFORE running, not after. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-18 21:30:08 -04:00

23 Commits