docs: record failed cross-track warm-start transfer experiments exp15 and exp16
This commit is contained in:
parent
a8aef52f00
commit
6e2427571a
29
DECISIONS.md
29
DECISIONS.md
|
|
@ -547,3 +547,32 @@ and likely patch the Unity scene on branch:
|
||||||
- `investigate-mountain-friction`
|
- `investigate-mountain-friction`
|
||||||
|
|
||||||
This should be prioritized over adding more reward heuristics.
|
This should be prioritized over adding more reward heuristics.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ADR-024: Direct Cross-Track Warm Starts Are Not Currently Helpful
|
||||||
|
|
||||||
|
**Date:** 2026-04-19
|
||||||
|
**Status:** Accepted
|
||||||
|
|
||||||
|
**Context:** After recovering strong single-track champions for generated and mountain,
|
||||||
|
we tested direct single-track transfer in both directions using tracked experiments:
|
||||||
|
- mountain robust champion → generated_track (`exp15_gentrack_from_mountain.py`)
|
||||||
|
- generated_track champion → mountain_track (`exp16_mountain_from_gentrack.py`)
|
||||||
|
|
||||||
|
These tests were designed specifically to avoid the old broken multi-track setup,
|
||||||
|
so failure here is more meaningful than the earlier contaminated transfer attempts.
|
||||||
|
|
||||||
|
**Observed outcomes:**
|
||||||
|
- mountain → generated failed early and showed exploit-like behavior near the start
|
||||||
|
- generated → mountain plateaued around ~193-195 steps with no laps even past 200k steps
|
||||||
|
|
||||||
|
**Decision:** Do not assume direct warm-start transfer between generated_track and
|
||||||
|
mountain_track is useful. Treat the current single-track champions as specialized
|
||||||
|
experts, not as obviously reusable initializations for the other track.
|
||||||
|
|
||||||
|
**Consequence:**
|
||||||
|
- Prefer clean single-track training / finetuning over cross-track warm starts
|
||||||
|
- If transfer is revisited, it likely needs a more careful method than naive direct
|
||||||
|
warm-starting on the other track
|
||||||
|
- Mountain physics issues should be addressed before revisiting transfer conclusions
|
||||||
|
|
|
||||||
|
|
@ -173,6 +173,47 @@ We created a dedicated Unity investigation branch before changing anything:
|
||||||
- repo: `/mnt/c/Users/Paul/Documents/projects/sdsandbox`
|
- repo: `/mnt/c/Users/Paul/Documents/projects/sdsandbox`
|
||||||
- branch: `investigate-mountain-friction`
|
- branch: `investigate-mountain-friction`
|
||||||
|
|
||||||
|
### Cross-track warm-start transfer tests (Exp 15 / Exp 16)
|
||||||
|
We tested whether the best single-track champions could be re-used as warm starts on the other track.
|
||||||
|
|
||||||
|
#### Exp 15 — mountain → generated
|
||||||
|
- Script: `agent/experiments/exp15_gentrack_from_mountain.py`
|
||||||
|
- Warm start:
|
||||||
|
- `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
|
||||||
|
- Target track:
|
||||||
|
- `generated_track`
|
||||||
|
- Result: **failed**
|
||||||
|
- User-observed behavior:
|
||||||
|
- exploit-like behavior near start / first corner
|
||||||
|
- not driving proper laps
|
||||||
|
- Log evidence by ~25k steps:
|
||||||
|
- `[20,000] reward=45.0 steps=47 laps=0`
|
||||||
|
- `[25,000] reward=23.4 steps=30 laps=0`
|
||||||
|
- short exploit laps appeared in log (`6.5s`, `4.91s`)
|
||||||
|
- Conclusion:
|
||||||
|
- mountain policy prior does **not** transfer cleanly to generated-track in this setup
|
||||||
|
|
||||||
|
#### Exp 16 — generated → mountain
|
||||||
|
- Script: `agent/experiments/exp16_mountain_from_gentrack.py`
|
||||||
|
- Warm start:
|
||||||
|
- `agent/models/exp13-gentrack-v4/best_model.zip`
|
||||||
|
- Target track:
|
||||||
|
- `mountain_track`
|
||||||
|
- Result: **failed**
|
||||||
|
- Behavior:
|
||||||
|
- no meaningful hill learning
|
||||||
|
- repeated short crash pattern
|
||||||
|
- Log evidence deep into run:
|
||||||
|
- `[210,000] reward=10.2 steps=195 laps=0`
|
||||||
|
- `[215,000] reward=10.1 steps=193 laps=0`
|
||||||
|
- Conclusion:
|
||||||
|
- generated-track champion does **not** bootstrap mountain learning effectively in the current setup
|
||||||
|
|
||||||
|
Overall takeaway:
|
||||||
|
- Direct cross-track warm starts failed in **both** directions.
|
||||||
|
- This suggests the source policies are too specialized, or that mountain physics / reward differences are too large for naive transfer.
|
||||||
|
- For now, single-track champions remain useful as champions, but not as obvious warm-start initializations for the other track.
|
||||||
|
|
||||||
## Critical Known Facts (DO NOT LOSE)
|
## Critical Known Facts (DO NOT LOSE)
|
||||||
|
|
||||||
### throttle_min history (from Exp 1-9)
|
### throttle_min history (from Exp 1-9)
|
||||||
|
|
|
||||||
|
|
@ -445,3 +445,66 @@ Promoted copy saved as:
|
||||||
- The best mountain finetune model is the **36k checkpoint after switching back to 0.2 floor**, not the later checkpoints.
|
- The best mountain finetune model is the **36k checkpoint after switching back to 0.2 floor**, not the later checkpoints.
|
||||||
- Later finetune checkpoints collapsed badly, matching the user's visual observation of wheelspin / poor driving.
|
- Later finetune checkpoints collapsed badly, matching the user's visual observation of wheelspin / poor driving.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Exp 15 — Generated track warm-start from mountain champion (2026-04-19)
|
||||||
|
|
||||||
|
- **Script:** `agent/experiments/exp15_gentrack_from_mountain.py`
|
||||||
|
- **Warm start:** `agent/models/exp14-mountain-v5-finetune/best_robust_model_0036000.zip`
|
||||||
|
- **Target track:** `generated_track`
|
||||||
|
- **Target setup:** Exp 13-style v4 generated-track training
|
||||||
|
- **Result:** ❌ Failed
|
||||||
|
|
||||||
|
**Observed behavior:**
|
||||||
|
- Model tried exploit-like behavior near the start / first corner
|
||||||
|
- Did not learn clean generated-track driving
|
||||||
|
- By ~25k steps, it was clearly far behind the known-good scratch run
|
||||||
|
|
||||||
|
**Log evidence:**
|
||||||
|
- `[20,000] reward=45.0 steps=47 laps=0`
|
||||||
|
- `[25,000] reward=23.4 steps=30 laps=0`
|
||||||
|
- Short exploit laps appeared in the log (`6.5s`, `4.91s`)
|
||||||
|
|
||||||
|
**Conclusion:**
|
||||||
|
- Mountain → generated warm-start transfer is poor in this direct setup
|
||||||
|
- The mountain policy prior seems to bias the agent toward bad local behavior instead of helping generated-track learning
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Exp 16 — Mountain track warm-start from generated champion (2026-04-19)
|
||||||
|
|
||||||
|
- **Script:** `agent/experiments/exp16_mountain_from_gentrack.py`
|
||||||
|
- **Warm start:** `agent/models/exp13-gentrack-v4/best_model.zip`
|
||||||
|
- **Target track:** `mountain_track`
|
||||||
|
- **Target setup:** Exp 14-style v5 mountain training
|
||||||
|
- **Result:** ❌ Failed
|
||||||
|
|
||||||
|
**Observed behavior:**
|
||||||
|
- No meaningful mountain learning
|
||||||
|
- Repeated short crash pattern
|
||||||
|
- Never developed lap-completing mountain behavior
|
||||||
|
|
||||||
|
**Log evidence:**
|
||||||
|
- `[210,000] reward=10.2 steps=195 laps=0`
|
||||||
|
- `[215,000] reward=10.1 steps=193 laps=0`
|
||||||
|
|
||||||
|
**Conclusion:**
|
||||||
|
- Generated → mountain warm-start transfer is also poor in this direct setup
|
||||||
|
- The generated-track champion does not bootstrap mountain hill learning effectively here
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Transfer-learning takeaway (current evidence)
|
||||||
|
|
||||||
|
Direct cross-track warm starts failed in **both** directions:
|
||||||
|
- mountain → generated: failed / exploit-prone
|
||||||
|
- generated → mountain: failed / short-crash plateau
|
||||||
|
|
||||||
|
Current interpretation:
|
||||||
|
- the single-track policies are too specialized for naive direct transfer, and/or
|
||||||
|
- the mountain sim physics differences are large enough to break transfer
|
||||||
|
|
||||||
|
For now:
|
||||||
|
- keep the single-track champions as separate specialists
|
||||||
|
- do **not** assume direct cross-track warm starts are beneficial
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue