donkeycar-rl-autoresearch/agent/analysis_circular_driving.py

47 lines
2.3 KiB
Python
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

"""
== DATA ANALYSIS: Circular Driving Detection (2026-04-13) ==
FINDINGS from Phase 1 data (autoresearch_results_phase1.jsonl):
Trial mean_rwd std rps cv% verdict
1 270.56 0.143 0.086 0.1% ⚠️ LOW STD suspicious — possibly circling
4 627.69 2.35 0.147 0.4% OK — low variance, moderate reward
5 4582.80 0.485 0.957 0.0% 🚨 CIRCULAR — 74% of theoretical max, cv=0.0%
6 454.06 2.73 0.092 0.6% OK — consistent, plausible
10 682.74 420.91 0.153 61.7% ⚠️ UNSTABLE — extremely high variance
11 404.52 14.47 0.084 3.6% OK — reasonable variance
KEY SIGNATURES OF CIRCULAR DRIVING:
1. cv (coefficient of variation) < 1% with mean_reward > 200 → very CONSISTENT circling
- Trial 5: cv=0.0%, mean=4582 → textbook circular motion
- Trial 1: cv=0.1%, mean=270 → likely also circling but slower
2. reward/step approaching theoretical max → car is getting near-optimal reward continuously
- Trial 5: 0.957/step ≈ 74% of max (speed≈3 m/s) → sustained on-track fast motion
- This is achievable by circling at the starting line!
3. User visual confirmation → car going left in circles at starting position
WHY OUR REWARD WRAPPER v2 STILL ALLOWS CIRCLING:
The fix was correct for the ADDITIVE formula (speed × f(cte)).
The MULTIPLICATIVE formula prevents off-track hacking.
BUT: a car circling ON-TRACK still gets full speed bonus!
- Car circles at start (CTE ≈ 0) → original_reward > 0
- Car has speed 3 → shaped = 1.0 × (1 + 0.1 × 3) = 1.3/step
- Over 4787 steps: max = 6223, actual = 4582 → 74% efficiency (car is on track most of time!)
THE FUNDAMENTAL PROBLEM:
Neither CTE nor speed can distinguish FORWARD driving from CIRCULAR driving.
Both have: low CTE (car is centered), positive speed (car is moving).
We need a reward component that is ZERO for circular motion and POSITIVE for forward progress.
SOLUTION: Path Efficiency Reward
efficiency = net_displacement / path_length (over sliding window)
- Forward driving: efficiency ≈ 1.0 (all movement is productive)
- Circular driving: efficiency ≈ 0.0 (lots of movement, no net advance)
- Shaped reward: original × (1 + speed_scale × speed × efficiency)
"""
print(__doc__)