From 6e2427571a2ec4be3426b36f7774887fbf67be93 Mon Sep 17 00:00:00 2001
From: Paul Huliganga <paje0101@gmail.com>
Date: Mon, 20 Apr 2026 20:18:08 -0400
Subject: [PATCH] docs: record failed cross-track warm-start transfer
 experiments exp15 and exp16

---
 DECISIONS.md                   | 29 ++++++++++++++++
 docs/SESSION_LOG_2026-04-19.md | 41 ++++++++++++++++++++++
 docs/TEST_HISTORY.md           | 63 ++++++++++++++++++++++++++++++++++
 3 files changed, 133 insertions(+)

diff --git a/DECISIONS.md b/DECISIONS.md
index 6712aa0..41c125a 100644
--- a/DECISIONS.md
+++ b/DECISIONS.md
@@ -547,3 +547,32 @@ and likely patch the Unity scene on branch:
 - `investigate-mountain-friction`
 
 This should be prioritized over adding more reward heuristics.
+
+---
+
+## ADR-024: Direct Cross-Track Warm Starts Are Not Currently Helpful
+
+**Date:** 2026-04-19
+**Status:** Accepted
+
+**Context:** After recovering strong single-track champions for generated and mountain,
+we tested direct single-track transfer in both directions using tracked experiments:
+- mountain robust champion → generated_track (`exp15_gentrack_from_mountain.py`)
+- generated_track champion → mountain_track (`exp16_mountain_from_gentrack.py`)
+
+These tests were designed specifically to avoid the old broken multi-track setup,
+so failure here is more meaningful than the earlier contaminated transfer attempts.
+
+**Observed outcomes:**
+- mountain → generated failed early and showed exploit-like behavior near the start
+- generated → mountain plateaued around ~193-195 steps with no laps even past 200k steps
+
+**Decision:** Do not assume direct warm-start transfer between generated_track and
+mountain_track is useful. Treat the current single-track champions as specialized
+experts, not as obviously reusable initializations for the other track.
+
+**Consequence:**
+- Prefer clean single-track training / finetuning over cross-track warm starts
+- If transfer is revisited, it likely needs a more careful method than naive direct
+  warm-starting on the other track
+- Mountain physics issues should be addressed before revisiting transfer conclusions
diff --git a/docs/SESSION_LOG_2026-04-19.md b/docs/SESSION_LOG_2026-04-19.md
index fc34c6f..be10b51 100644
--- a/docs/SESSION_LOG_2026-04-19.md
+++ b/docs/SESSION_LOG_2026-04-19.md
@@ -173,6 +173,47 @@ We created a dedicated Unity investigation branch before changing anything:
 - repo: `/mnt/c/Users/Paul/Documents/projects/sdsandbox`
 - branch: `investigate-mountain-friction`
 
+### Cross-track warm-start transfer tests (Exp 15 / Exp 16)
+We tested whether the best single-track champions could be re-used as warm starts on the other track.
+
+#### Exp 15 — mountain → generated
+- Script: `agent/experiments/exp15_gentrack_from_mountain.py`
+- Warm start:
+  - `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
+- Target track:
+  - `generated_track`
+- Result: **failed**
+- User-observed behavior:
+  - exploit-like behavior near start / first corner
+  - not driving proper laps
+- Log evidence by ~25k steps:
+  - `[20,000] reward=45.0 steps=47 laps=0`
+  - `[25,000] reward=23.4 steps=30 laps=0`
+  - short exploit laps appeared in log (`6.5s`, `4.91s`)
+- Conclusion:
+  - mountain policy prior does **not** transfer cleanly to generated-track in this setup
+
+#### Exp 16 — generated → mountain
+- Script: `agent/experiments/exp16_mountain_from_gentrack.py`
+- Warm start:
+  - `agent/models/exp13-gentrack-v4/best_model.zip`
+- Target track:
+  - `mountain_track`
+- Result: **failed**
+- Behavior:
+  - no meaningful hill learning
+  - repeated short crash pattern
+- Log evidence deep into run:
+  - `[210,000] reward=10.2 steps=195 laps=0`
+  - `[215,000] reward=10.1 steps=193 laps=0`
+- Conclusion:
+  - generated-track champion does **not** bootstrap mountain learning effectively in the current setup
+
+Overall takeaway:
+- Direct cross-track warm starts failed in **both** directions.
+- This suggests the source policies are too specialized, or that mountain physics / reward differences are too large for naive transfer.
+- For now, single-track champions remain useful as champions, but not as obvious warm-start initializations for the other track.
+
 ## Critical Known Facts (DO NOT LOSE)
 
 ### throttle_min history (from Exp 1-9)
diff --git a/docs/TEST_HISTORY.md b/docs/TEST_HISTORY.md
index 724fc3c..c05494f 100644
--- a/docs/TEST_HISTORY.md
+++ b/docs/TEST_HISTORY.md
@@ -445,3 +445,66 @@ Promoted copy saved as:
 - The best mountain finetune model is the **36k checkpoint after switching back to 0.2 floor**, not the later checkpoints.
 - Later finetune checkpoints collapsed badly, matching the user's visual observation of wheelspin / poor driving.
 
+---
+
+## Exp 15 — Generated track warm-start from mountain champion (2026-04-19)
+
+- **Script:** `agent/experiments/exp15_gentrack_from_mountain.py`
+- **Warm start:** `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
+- **Target track:** `generated_track`
+- **Target setup:** Exp 13-style v4 generated-track training
+- **Result:** ❌ Failed
+
+**Observed behavior:**
+- Model tried exploit-like behavior near the start / first corner
+- Did not learn clean generated-track driving
+- By ~25k steps, it was clearly far behind the known-good scratch run
+
+**Log evidence:**
+- `[20,000] reward=45.0 steps=47 laps=0`
+- `[25,000] reward=23.4 steps=30 laps=0`
+- Short exploit laps appeared in the log (`6.5s`, `4.91s`)
+
+**Conclusion:**
+- Mountain → generated warm-start transfer is poor in this direct setup
+- The mountain policy prior seems to bias the agent toward bad local behavior instead of helping generated-track learning
+
+---
+
+## Exp 16 — Mountain track warm-start from generated champion (2026-04-19)
+
+- **Script:** `agent/experiments/exp16_mountain_from_gentrack.py`
+- **Warm start:** `agent/models/exp13-gentrack-v4/best_model.zip`
+- **Target track:** `mountain_track`
+- **Target setup:** Exp 14-style v5 mountain training
+- **Result:** ❌ Failed
+
+**Observed behavior:**
+- No meaningful mountain learning
+- Repeated short crash pattern
+- Never developed lap-completing mountain behavior
+
+**Log evidence:**
+- `[210,000] reward=10.2 steps=195 laps=0`
+- `[215,000] reward=10.1 steps=193 laps=0`
+
+**Conclusion:**
+- Generated → mountain warm-start transfer is also poor in this direct setup
+- The generated-track champion does not bootstrap mountain hill learning effectively here
+
+---
+
+## Transfer-learning takeaway (current evidence)
+
+Direct cross-track warm starts failed in **both** directions:
+- mountain → generated: failed / exploit-prone
+- generated → mountain: failed / short-crash plateau
+
+Current interpretation:
+- the single-track policies are too specialized for naive direct transfer, and/or
+- the mountain sim physics differences are large enough to break transfer
+
+For now:
+- keep the single-track champions as separate specialists
+- do **not** assume direct cross-track warm starts are beneficial
+