""" == DATA ANALYSIS: Circular Driving Detection (2026-04-13) == FINDINGS from Phase 1 data (autoresearch_results_phase1.jsonl): Trial mean_rwd std rps cv% verdict 1 270.56 0.143 0.086 0.1% ⚠️ LOW STD suspicious — possibly circling 4 627.69 2.35 0.147 0.4% OK — low variance, moderate reward 5 4582.80 0.485 0.957 0.0% 🚨 CIRCULAR — 74% of theoretical max, cv=0.0% 6 454.06 2.73 0.092 0.6% OK — consistent, plausible 10 682.74 420.91 0.153 61.7% ⚠️ UNSTABLE — extremely high variance 11 404.52 14.47 0.084 3.6% OK — reasonable variance KEY SIGNATURES OF CIRCULAR DRIVING: 1. cv (coefficient of variation) < 1% with mean_reward > 200 → very CONSISTENT circling - Trial 5: cv=0.0%, mean=4582 → textbook circular motion - Trial 1: cv=0.1%, mean=270 → likely also circling but slower 2. reward/step approaching theoretical max → car is getting near-optimal reward continuously - Trial 5: 0.957/step ≈ 74% of max (speed≈3 m/s) → sustained on-track fast motion - This is achievable by circling at the starting line! 3. User visual confirmation → car going left in circles at starting position WHY OUR REWARD WRAPPER v2 STILL ALLOWS CIRCLING: The fix was correct for the ADDITIVE formula (speed × f(cte)). The MULTIPLICATIVE formula prevents off-track hacking. BUT: a car circling ON-TRACK still gets full speed bonus! - Car circles at start (CTE ≈ 0) → original_reward > 0 - Car has speed 3 → shaped = 1.0 × (1 + 0.1 × 3) = 1.3/step - Over 4787 steps: max = 6223, actual = 4582 → 74% efficiency (car is on track most of time!) THE FUNDAMENTAL PROBLEM: Neither CTE nor speed can distinguish FORWARD driving from CIRCULAR driving. Both have: low CTE (car is centered), positive speed (car is moving). We need a reward component that is ZERO for circular motion and POSITIVE for forward progress. SOLUTION: Path Efficiency Reward efficiency = net_displacement / path_length (over sliding window) - Forward driving: efficiency ≈ 1.0 (all movement is productive) - Circular driving: efficiency ≈ 0.0 (lots of movement, no net advance) - Shaped reward: original × (1 + speed_scale × speed × efficiency) """ print(__doc__)