From 6e2427571a2ec4be3426b36f7774887fbf67be93 Mon Sep 17 00:00:00 2001 From: Paul Huliganga Date: Mon, 20 Apr 2026 20:18:08 -0400 Subject: [PATCH] docs: record failed cross-track warm-start transfer experiments exp15 and exp16 --- DECISIONS.md | 29 ++++++++++++++++ docs/SESSION_LOG_2026-04-19.md | 41 ++++++++++++++++++++++ docs/TEST_HISTORY.md | 63 ++++++++++++++++++++++++++++++++++ 3 files changed, 133 insertions(+) diff --git a/DECISIONS.md b/DECISIONS.md index 6712aa0..41c125a 100644 --- a/DECISIONS.md +++ b/DECISIONS.md @@ -547,3 +547,32 @@ and likely patch the Unity scene on branch: - `investigate-mountain-friction` This should be prioritized over adding more reward heuristics. + +--- + +## ADR-024: Direct Cross-Track Warm Starts Are Not Currently Helpful + +**Date:** 2026-04-19 +**Status:** Accepted + +**Context:** After recovering strong single-track champions for generated and mountain, +we tested direct single-track transfer in both directions using tracked experiments: +- mountain robust champion → generated_track (`exp15_gentrack_from_mountain.py`) +- generated_track champion → mountain_track (`exp16_mountain_from_gentrack.py`) + +These tests were designed specifically to avoid the old broken multi-track setup, +so failure here is more meaningful than the earlier contaminated transfer attempts. + +**Observed outcomes:** +- mountain → generated failed early and showed exploit-like behavior near the start +- generated → mountain plateaued around ~193-195 steps with no laps even past 200k steps + +**Decision:** Do not assume direct warm-start transfer between generated_track and +mountain_track is useful. Treat the current single-track champions as specialized +experts, not as obviously reusable initializations for the other track. + +**Consequence:** +- Prefer clean single-track training / finetuning over cross-track warm starts +- If transfer is revisited, it likely needs a more careful method than naive direct + warm-starting on the other track +- Mountain physics issues should be addressed before revisiting transfer conclusions diff --git a/docs/SESSION_LOG_2026-04-19.md b/docs/SESSION_LOG_2026-04-19.md index fc34c6f..be10b51 100644 --- a/docs/SESSION_LOG_2026-04-19.md +++ b/docs/SESSION_LOG_2026-04-19.md @@ -173,6 +173,47 @@ We created a dedicated Unity investigation branch before changing anything: - repo: `/mnt/c/Users/Paul/Documents/projects/sdsandbox` - branch: `investigate-mountain-friction` +### Cross-track warm-start transfer tests (Exp 15 / Exp 16) +We tested whether the best single-track champions could be re-used as warm starts on the other track. + +#### Exp 15 — mountain → generated +- Script: `agent/experiments/exp15_gentrack_from_mountain.py` +- Warm start: + - `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip` +- Target track: + - `generated_track` +- Result: **failed** +- User-observed behavior: + - exploit-like behavior near start / first corner + - not driving proper laps +- Log evidence by ~25k steps: + - `[20,000] reward=45.0 steps=47 laps=0` + - `[25,000] reward=23.4 steps=30 laps=0` + - short exploit laps appeared in log (`6.5s`, `4.91s`) +- Conclusion: + - mountain policy prior does **not** transfer cleanly to generated-track in this setup + +#### Exp 16 — generated → mountain +- Script: `agent/experiments/exp16_mountain_from_gentrack.py` +- Warm start: + - `agent/models/exp13-gentrack-v4/best_model.zip` +- Target track: + - `mountain_track` +- Result: **failed** +- Behavior: + - no meaningful hill learning + - repeated short crash pattern +- Log evidence deep into run: + - `[210,000] reward=10.2 steps=195 laps=0` + - `[215,000] reward=10.1 steps=193 laps=0` +- Conclusion: + - generated-track champion does **not** bootstrap mountain learning effectively in the current setup + +Overall takeaway: +- Direct cross-track warm starts failed in **both** directions. +- This suggests the source policies are too specialized, or that mountain physics / reward differences are too large for naive transfer. +- For now, single-track champions remain useful as champions, but not as obvious warm-start initializations for the other track. + ## Critical Known Facts (DO NOT LOSE) ### throttle_min history (from Exp 1-9) diff --git a/docs/TEST_HISTORY.md b/docs/TEST_HISTORY.md index 724fc3c..c05494f 100644 --- a/docs/TEST_HISTORY.md +++ b/docs/TEST_HISTORY.md @@ -445,3 +445,66 @@ Promoted copy saved as: - The best mountain finetune model is the **36k checkpoint after switching back to 0.2 floor**, not the later checkpoints. - Later finetune checkpoints collapsed badly, matching the user's visual observation of wheelspin / poor driving. +--- + +## Exp 15 — Generated track warm-start from mountain champion (2026-04-19) + +- **Script:** `agent/experiments/exp15_gentrack_from_mountain.py` +- **Warm start:** `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip` +- **Target track:** `generated_track` +- **Target setup:** Exp 13-style v4 generated-track training +- **Result:** ❌ Failed + +**Observed behavior:** +- Model tried exploit-like behavior near the start / first corner +- Did not learn clean generated-track driving +- By ~25k steps, it was clearly far behind the known-good scratch run + +**Log evidence:** +- `[20,000] reward=45.0 steps=47 laps=0` +- `[25,000] reward=23.4 steps=30 laps=0` +- Short exploit laps appeared in the log (`6.5s`, `4.91s`) + +**Conclusion:** +- Mountain → generated warm-start transfer is poor in this direct setup +- The mountain policy prior seems to bias the agent toward bad local behavior instead of helping generated-track learning + +--- + +## Exp 16 — Mountain track warm-start from generated champion (2026-04-19) + +- **Script:** `agent/experiments/exp16_mountain_from_gentrack.py` +- **Warm start:** `agent/models/exp13-gentrack-v4/best_model.zip` +- **Target track:** `mountain_track` +- **Target setup:** Exp 14-style v5 mountain training +- **Result:** ❌ Failed + +**Observed behavior:** +- No meaningful mountain learning +- Repeated short crash pattern +- Never developed lap-completing mountain behavior + +**Log evidence:** +- `[210,000] reward=10.2 steps=195 laps=0` +- `[215,000] reward=10.1 steps=193 laps=0` + +**Conclusion:** +- Generated → mountain warm-start transfer is also poor in this direct setup +- The generated-track champion does not bootstrap mountain hill learning effectively here + +--- + +## Transfer-learning takeaway (current evidence) + +Direct cross-track warm starts failed in **both** directions: +- mountain → generated: failed / exploit-prone +- generated → mountain: failed / short-crash plateau + +Current interpretation: +- the single-track policies are too specialized for naive direct transfer, and/or +- the mountain sim physics differences are large enough to break transfer + +For now: +- keep the single-track champions as separate specialists +- do **not** assume direct cross-track warm starts are beneficial +