47 lines
2.3 KiB
Python
47 lines
2.3 KiB
Python
"""
|
||
== DATA ANALYSIS: Circular Driving Detection (2026-04-13) ==
|
||
|
||
FINDINGS from Phase 1 data (autoresearch_results_phase1.jsonl):
|
||
|
||
Trial mean_rwd std rps cv% verdict
|
||
1 270.56 0.143 0.086 0.1% ⚠️ LOW STD suspicious — possibly circling
|
||
4 627.69 2.35 0.147 0.4% OK — low variance, moderate reward
|
||
5 4582.80 0.485 0.957 0.0% 🚨 CIRCULAR — 74% of theoretical max, cv=0.0%
|
||
6 454.06 2.73 0.092 0.6% OK — consistent, plausible
|
||
10 682.74 420.91 0.153 61.7% ⚠️ UNSTABLE — extremely high variance
|
||
11 404.52 14.47 0.084 3.6% OK — reasonable variance
|
||
|
||
KEY SIGNATURES OF CIRCULAR DRIVING:
|
||
1. cv (coefficient of variation) < 1% with mean_reward > 200 → very CONSISTENT circling
|
||
- Trial 5: cv=0.0%, mean=4582 → textbook circular motion
|
||
- Trial 1: cv=0.1%, mean=270 → likely also circling but slower
|
||
|
||
2. reward/step approaching theoretical max → car is getting near-optimal reward continuously
|
||
- Trial 5: 0.957/step ≈ 74% of max (speed≈3 m/s) → sustained on-track fast motion
|
||
- This is achievable by circling at the starting line!
|
||
|
||
3. User visual confirmation → car going left in circles at starting position
|
||
|
||
WHY OUR REWARD WRAPPER v2 STILL ALLOWS CIRCLING:
|
||
The fix was correct for the ADDITIVE formula (speed × f(cte)).
|
||
The MULTIPLICATIVE formula prevents off-track hacking.
|
||
BUT: a car circling ON-TRACK still gets full speed bonus!
|
||
- Car circles at start (CTE ≈ 0) → original_reward > 0
|
||
- Car has speed 3 → shaped = 1.0 × (1 + 0.1 × 3) = 1.3/step
|
||
- Over 4787 steps: max = 6223, actual = 4582 → 74% efficiency (car is on track most of time!)
|
||
|
||
THE FUNDAMENTAL PROBLEM:
|
||
Neither CTE nor speed can distinguish FORWARD driving from CIRCULAR driving.
|
||
Both have: low CTE (car is centered), positive speed (car is moving).
|
||
|
||
We need a reward component that is ZERO for circular motion and POSITIVE for forward progress.
|
||
|
||
SOLUTION: Path Efficiency Reward
|
||
efficiency = net_displacement / path_length (over sliding window)
|
||
- Forward driving: efficiency ≈ 1.0 (all movement is productive)
|
||
- Circular driving: efficiency ≈ 0.0 (lots of movement, no net advance)
|
||
- Shaped reward: original × (1 + speed_scale × speed × efficiency)
|
||
"""
|
||
|
||
print(__doc__)
|