Problem with v4 on mountain_track: CTE × efficiency × speed all collapse to zero simultaneously when the car slows on the hill, giving no gradient signal for 'apply more throttle'. v5: reward = (speed / 10) × (1 - |CTE| / max_cte) - Directly rewards going fast while staying centred - Hill: car slows → reward drops → clear gradient toward more throttle - Circling protection now entirely handled by lap-time penalty + StuckTerminationWrapper (not by the reward formula) Tests updated to reflect v5 semantics (102 passing). Agent: pi Tests: 102 passed Tests-Added: 0 TypeScript: N/A |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| test_autoresearch_controller.py | ||
| test_behavioral_wrappers.py | ||
| test_discretize_action.py | ||
| test_end_to_end.py | ||
| test_reward_wrapper.py | ||
| test_runner_integration.py | ||
| test_wave3.py | ||