donkeycar-rl-autoresearch/agent/analysis_circular_driving.py

"""
== DATA ANALYSIS: Circular Driving Detection (2026-04-13) ==

FINDINGS from Phase 1 data (autoresearch_results_phase1.jsonl):

Trial  mean_rwd   std    rps     cv%    verdict
  1    270.56   0.143  0.086   0.1%   ⚠️ LOW STD suspicious — possibly circling
  4    627.69   2.35   0.147   0.4%   OK — low variance, moderate reward
  5   4582.80   0.485  0.957   0.0%   🚨 CIRCULAR — 74% of theoretical max, cv=0.0%
  6    454.06   2.73   0.092   0.6%   OK — consistent, plausible
 10    682.74  420.91  0.153  61.7%   ⚠️ UNSTABLE — extremely high variance
 11    404.52  14.47   0.084   3.6%   OK — reasonable variance

KEY SIGNATURES OF CIRCULAR DRIVING:
1. cv (coefficient of variation) < 1% with mean_reward > 200 → very CONSISTENT circling
   - Trial 5: cv=0.0%, mean=4582 → textbook circular motion
   - Trial 1: cv=0.1%, mean=270 → likely also circling but slower

2. reward/step approaching theoretical max → car is getting near-optimal reward continuously
   - Trial 5: 0.957/step ≈ 74% of max (speed≈3 m/s) → sustained on-track fast motion
   - This is achievable by circling at the starting line!

3. User visual confirmation → car going left in circles at starting position

WHY OUR REWARD WRAPPER v2 STILL ALLOWS CIRCLING:
   The fix was correct for the ADDITIVE formula (speed × f(cte)).
   The MULTIPLICATIVE formula prevents off-track hacking.
   BUT: a car circling ON-TRACK still gets full speed bonus!
   - Car circles at start (CTE ≈ 0) → original_reward > 0
   - Car has speed 3 → shaped = 1.0 × (1 + 0.1 × 3) = 1.3/step
   - Over 4787 steps: max = 6223, actual = 4582 → 74% efficiency (car is on track most of time!)

THE FUNDAMENTAL PROBLEM:
   Neither CTE nor speed can distinguish FORWARD driving from CIRCULAR driving.
   Both have: low CTE (car is centered), positive speed (car is moving).

   We need a reward component that is ZERO for circular motion and POSITIVE for forward progress.

SOLUTION: Path Efficiency Reward
   efficiency = net_displacement / path_length  (over sliding window)
   - Forward driving: efficiency ≈ 1.0 (all movement is productive)
   - Circular driving: efficiency ≈ 0.0 (lots of movement, no net advance)
   - Shaped reward: original × (1 + speed_scale × speed × efficiency)
"""

print(__doc__)