donkeycar-rl-autoresearch

Commit Graph

Author	SHA1	Message	Date
Paul Huliganga	e61ebc5b38	fix: prevent trial timeouts losing all data Two changes: 1. Lower total_timesteps cap: 120k → 90k Actual throughput is 16 steps/sec (not 20 as estimated). 120k steps = 126 min training + 9 min overhead = 135 min > 2hr limit. 90k steps = 94 min + 8 min overhead = 102 min, safely within limit. 2. Per-segment checkpoint saves in multitrack_runner model.save() called after every segment so the latest weights are always on disk. If the runner is killed (timeout/crash/Ctrl+C), training data is never completely lost. 3. Timeout rescue eval in wave4_controller If JOB_TIMEOUT fires and a checkpoint exists, immediately runs a quick mini_monaco eval on the checkpoint so the trial still produces a GP data point despite the timeout. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-15 21:54:50 -04:00
Paul Huliganga	c10e56d894	fix: cap total_timesteps at 120k to prevent 2hr timeout Trials 3+4 both proposed ~140k steps and hit the 2hr JOB_TIMEOUT, wasting time and producing no GP data. At ~20 steps/sec, 120k steps takes ~100 min, safely within the 2hr limit. Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A	2026-04-15 16:30:07 -04:00
Paul Huliganga	7534527722	Wave 4: scratch training on generated_track + mountain_track, zero-shot mini_monaco Strategy change driven by Trial 1 data analysis: - generated_road removed: too similar to generated_track, and Phase-2 warm-start caused catastrophic forgetting (reward 2388→37 in one rotation) - mountain_track mean reward was only 17 — model never converged there - mini_monaco score 24.9 (37 steps) — model was outputting degenerate actions Wave 4 approach: - NO warm-start: fresh random weights every trial - Train: generated_track + mountain_track (visually distinct backgrounds, both have road markings — forces model to learn general mark-following) - Test (zero-shot): mini_monaco only (never seen during training) - Wider LR search: [1e-4, 2e-3] (scratch model needs different range) - Larger step budgets: 60k-250k total (fresh model needs more time) - Seed params: lr=0.0003 and lr=0.001 (diverse from the start) Files: - multitrack_runner.py: 2 training tracks, no warm-start auto-detection - wave4_controller.py: new Wave 4 GP+UCB controller - tests updated: TRAINING_TRACKS assertion, seed param tests → wave4 - 96 tests passing ADR-013 to follow. Agent: pi Tests: 96 passed Tests-Added: 0 TypeScript: N/A	2026-04-14 22:40:38 -04:00

Author

SHA1

Message

Date

Paul Huliganga

e61ebc5b38

fix: prevent trial timeouts losing all data

Two changes:

1. Lower total_timesteps cap: 120k → 90k
   Actual throughput is 16 steps/sec (not 20 as estimated).
   120k steps = 126 min training + 9 min overhead = 135 min > 2hr limit.
   90k steps = 94 min + 8 min overhead = 102 min, safely within limit.

2. Per-segment checkpoint saves in multitrack_runner
   model.save() called after every segment so the latest weights are
   always on disk.  If the runner is killed (timeout/crash/Ctrl+C),
   training data is never completely lost.

3. Timeout rescue eval in wave4_controller
   If JOB_TIMEOUT fires and a checkpoint exists, immediately runs a
   quick mini_monaco eval on the checkpoint so the trial still produces
   a GP data point despite the timeout.

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A

2026-04-15 21:54:50 -04:00

Paul Huliganga

c10e56d894

fix: cap total_timesteps at 120k to prevent 2hr timeout

Trials 3+4 both proposed ~140k steps and hit the 2hr JOB_TIMEOUT,
wasting time and producing no GP data.  At ~20 steps/sec, 120k steps
takes ~100 min, safely within the 2hr limit.

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A

2026-04-15 16:30:07 -04:00

Paul Huliganga

7534527722

Wave 4: scratch training on generated_track + mountain_track, zero-shot mini_monaco

Strategy change driven by Trial 1 data analysis:
- generated_road removed: too similar to generated_track, and Phase-2
  warm-start caused catastrophic forgetting (reward 2388→37 in one rotation)
- mountain_track mean reward was only 17 — model never converged there
- mini_monaco score 24.9 (37 steps) — model was outputting degenerate actions

Wave 4 approach:
- NO warm-start: fresh random weights every trial
- Train: generated_track + mountain_track (visually distinct backgrounds,
  both have road markings — forces model to learn general mark-following)
- Test (zero-shot): mini_monaco only (never seen during training)
- Wider LR search: [1e-4, 2e-3] (scratch model needs different range)
- Larger step budgets: 60k-250k total (fresh model needs more time)
- Seed params: lr=0.0003 and lr=0.001 (diverse from the start)

Files:
- multitrack_runner.py: 2 training tracks, no warm-start auto-detection
- wave4_controller.py: new Wave 4 GP+UCB controller
- tests updated: TRAINING_TRACKS assertion, seed param tests → wave4
- 96 tests passing

ADR-013 to follow.

Agent: pi
Tests: 96 passed
Tests-Added: 0
TypeScript: N/A

2026-04-14 22:40:38 -04:00

3 Commits