diff --git a/DECISIONS.md b/DECISIONS.md index 66ae767..2e37b68 100644 --- a/DECISIONS.md +++ b/DECISIONS.md @@ -256,3 +256,62 @@ inputs. This is far more efficient than teaching driving from scratch. **Risk:** If the champion's policy is over-specialised (e.g., relies on very specific pixel features of desert background), warm-starting could hinder generalisation. This is why the GP tunes learning_rate — a higher LR will more aggressively overwrite specialised features. + +## ADR-013: Wave 4 — Train From Scratch on 2 Visually Distinct Tracks + +**Date:** 2026-04-14 +**Status:** Active + +**Decision:** Remove generated_road from training set. Train from random weights (no warm-start) on generated_track + mountain_track only. Test zero-shot on mini_monaco. + +**Why generated_road was removed:** +- Too visually similar to generated_track — doesn't force generalisation +- Phase 2 champion (trained only on generated_road) was used as warm-start in Wave 3 +- Warm-start caused catastrophic forgetting: generated_road reward went 2388→37 between rotations as the model forgot it while learning other tracks +- The warm-start weights were a local minimum the model couldn't escape + +**Why no warm-start:** +- Phase 2 CNN features were specialised for generated_road visual patterns +- 30–90k steps of multi-track training insufficient to overcome that prior +- Starting from random weights lets the CNN build features useful for both tracks simultaneously + +**Why generated_track + mountain_track:** +- Both outdoor, asphalt, yellow/white lane markings — same task category as mini_monaco +- Visually distinct backgrounds (trees vs mountain/barriers) — model must learn to ignore background and follow road markings, not recognise specific scenes +- If it can drive both, the learned features should generalise to mini_monaco (same visual category, never seen during training) + +**Proven result:** Overnight Wave 4 Trial 3 (lr=0.000685, switch=17,499, total=157,743 steps) scored mini_monaco=1943 (full 2000-step eval, never crashed). Model saved at agent/models/wave4-trial-0003/model.zip. + +--- + +## ADR-014: Always Measure Throughput Before Launching Long Runs + +**Date:** 2026-04-15 +**Status:** Active (learned the hard way) + +**Decision:** Before launching any autoresearch campaign, run a 5-minute timing benchmark to measure actual steps/sec. Set total_timesteps cap = (time_limit_minutes - overhead_minutes) × 60 × steps_per_sec × 0.85 safety margin. + +**Why:** Assumed 20 steps/sec based on Phase 2. Actual Wave 4 throughput is 16 steps/sec (mountain_track physics is heavier). This caused Trials 3, 4, 7, 8, 9 to timeout, wasting 10+ hours of compute. + +--- + +## ADR-015: Per-Segment Model Checkpointing is Non-Negotiable + +**Date:** 2026-04-15 +**Status:** Active + +**Decision:** Save model.zip after every training segment, not just at the end. If the runner is killed (timeout, crash, Ctrl+C), the latest checkpoint is on disk and training is never completely lost. + +**Why:** Five trials timed out with no saved model. Hours of gradient updates existed only in RAM and were lost on SIGKILL. + +--- + +## ADR-016: Verify Fixes Are Running Before Walking Away + +**Date:** 2026-04-15 +**Status:** Active + +**Decision:** After committing any fix, verify the running process is actually using the new code (check PID, log output, parameter values) before declaring it fixed. Commit + push ≠ running. + +**Why:** Multiple fixes (90k step cap, checkpointing, rescue eval) were committed but the controller was never restarted. The fixes never ran. This caused several more timeouts that the fixes were meant to prevent. + diff --git a/agent/outerloop-results/autoresearch_phase4_log.txt b/agent/outerloop-results/autoresearch_phase4_log.txt index da81bb0..54ff63d 100644 --- a/agent/outerloop-results/autoresearch_phase4_log.txt +++ b/agent/outerloop-results/autoresearch_phase4_log.txt @@ -341,3 +341,26 @@ [2026-04-15 22:23:23] [Wave4] Proposed params: {'learning_rate': 0.00192547022313727, 'steps_per_switch': 3237, 'total_timesteps': 124659} [2026-04-15 22:23:25] [Wave4] Launching trial 10: {'learning_rate': 0.00192547022313727, 'steps_per_switch': 3237, 'total_timesteps': 124659} [2026-04-15 22:23:25] [Wave4] Command: python3 /home/paulh/projects/donkeycar-rl-autoresearch/agent/multitrack_runner.py --total-timesteps 124659 --steps-per-switch 3237 --learning-rate 0.00192547022313727 --eval-episodes 3 --save-dir /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/wave4-trial-0010 +[2026-04-15 22:26:54] ================================================================= +[2026-04-15 22:26:54] [Wave4] Multi-Track Autoresearch — GP+UCB Generalization Search +[2026-04-15 22:26:54] [Wave4] Training tracks : generated_track, mountain_track (no generated_road, no warm-start) +[2026-04-15 22:26:54] [Wave4] Test tracks : mini_monaco only (zero-shot; warren removed — broken done condition) +[2026-04-15 22:26:54] [Wave4] Max trials : 25 | kappa=2.0 | push every 5 +[2026-04-15 22:26:54] [Wave4] Results file : /home/paulh/projects/donkeycar-rl-autoresearch/agent/outerloop-results/autoresearch_results_phase4.jsonl +[2026-04-15 22:26:54] [Wave4] Champion dir : /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/wave4-champion +[2026-04-15 22:26:54] [Wave4] Warm start : NONE (training from scratch each trial) +[2026-04-15 22:26:54] ================================================================= +[2026-04-15 22:26:54] [Wave4] Loaded 5 existing Phase 3 results. +[2026-04-15 22:26:54] [Wave4] Wave4 Champion: trial=3 score=1943.10 params={'learning_rate': 0.0006852550685205609, 'steps_per_switch': 17499, 'total_timesteps': 157743} +[2026-04-15 22:26:54] [Wave4] Starting from trial 6. +[2026-04-15 22:26:54] +[Wave4] ========== Trial 6/25 ========== +[2026-04-15 22:26:54] [Wave4] GP UCB top-5 proposals: +[2026-04-15 22:26:54] UCB=2.8029 mu=1.3217 σ=0.7406 params={'learning_rate': 0.0009434282949002715, 'steps_per_switch': 14966, 'total_timesteps': 83094} +[2026-04-15 22:26:54] UCB=2.7637 mu=1.4556 σ=0.6540 params={'learning_rate': 0.001016649027182601, 'steps_per_switch': 14757, 'total_timesteps': 85809} +[2026-04-15 22:26:54] UCB=2.7344 mu=1.1173 σ=0.8085 params={'learning_rate': 0.000525489856531106, 'steps_per_switch': 14503, 'total_timesteps': 81150} +[2026-04-15 22:26:54] UCB=2.7210 mu=1.0163 σ=0.8523 params={'learning_rate': 0.000448503297396427, 'steps_per_switch': 14723, 'total_timesteps': 80477} +[2026-04-15 22:26:54] UCB=2.6726 mu=0.9116 σ=0.8805 params={'learning_rate': 0.0011227428004033503, 'steps_per_switch': 14832, 'total_timesteps': 81442} +[2026-04-15 22:26:54] [Wave4] Proposed params: {'learning_rate': 0.0009434282949002715, 'steps_per_switch': 14966, 'total_timesteps': 83094} +[2026-04-15 22:26:56] [Wave4] Launching trial 6: {'learning_rate': 0.0009434282949002715, 'steps_per_switch': 14966, 'total_timesteps': 83094} +[2026-04-15 22:26:56] [Wave4] Command: python3 /home/paulh/projects/donkeycar-rl-autoresearch/agent/multitrack_runner.py --total-timesteps 83094 --steps-per-switch 14966 --learning-rate 0.0009434282949002715 --eval-episodes 3 --save-dir /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/wave4-trial-0006