Commit Graph

31 Commits

Author SHA1 Message Date
Paul Huliganga f784fdebd1 feat(exp25): wheel OverlapSphere collision fix + auto-transition
Car.cs (sdsandbox): per-wheel OverlapSphereNonAlloc in FixedUpdate catches barrier
contact from any angle, any throttle — forward raycast only covered nose-first.
Built, rsync'd, sim restart pending exp24 completion.

exp25 script: identical to exp24 params, fresh weights, patched Unity binary.
Auto-transition monitor armed: kills sim, restarts with new binary, launches exp25
when exp24 finishes (~22:00 EST).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 20:28:31 -04:00
Paul Huliganga c6a18e7fee chore(exp24): launch exp24, fix logging setup, update handoff
- Exp23 complete (mean 2000 steps / 408.6r, high variance confirms nose-first stuck issue)
- Unity 6000.4.4f1 rebuild done: Assembly-CSharp.dll updated with Car.cs raycast fix
- Rsync'd to runtime folder, sim restarted on port 9091
- Exp24 launched (PID 733053) — discrete(7), speed stuck, road regen
- Fix logging.basicConfig no-op: use file_log.addHandler() directly
- Monitor via /tmp/exp24.out (log file was 0 bytes with old approach)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 20:12:36 -04:00
Paul Huliganga 0d1acf8cdc feat(exp24): road regeneration between segments + fix Car.cs raycast
exp24: reconnect to sim after each 10k-step checkpoint.  Reconnecting reloads
the scene → sdsandbox generates a new random road.  Each training segment and
each checkpoint eval now runs on a different road layout, preventing overfitting
to a single road and giving meaningful generalization metrics in the eval logs.

Car.cs: add a short forward raycast in FixedUpdate to detect barriers the front
wheels are pressing against.  WheelColliders do not fire OnCollisionEnter/Stay on
the car's MonoBehaviour, so nose-first barrier contact was invisible to Car.cs
collision callbacks.  The raycast fires when throttle > 0.05 and a collider is
within 0.8m forward — registers the collision the same way OnCollisionStay does.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 17:58:33 -04:00
Paul Huliganga 924615ca60 feat(exp24): discrete steering + speed-based stuck detection
StuckTerminationWrapper: add low_speed_threshold + max_low_speed_seconds params.
Car pinned against a barrier has speed≈0 even while sliding laterally — lateral
drift was resetting the position-based displacement timer, leaving the car stuck
for up to max_episode_seconds. Speed-based check terminates after 2s at speed<0.5.

Exp24: 7-bin discrete steering (DiscretizedActionWrapper) eliminates Gaussian policy
noise that caused rapid oscillation in exp23. max_episode_seconds reduced to 30s
since speed-based stuck detection now handles the barrier-contact cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 17:41:42 -04:00
Paul Huliganga c05e79d30c fix(exp23): invisible barriers + single-instance guard
- generated_road.unity + generated_track.unity: showBarrierMeshes 1→0.
  Visible barrier meshes would appear in the camera observation and let the
  policy learn from an artificial visual cue that won't exist at eval time.
- exp23: add PID-file guard — aborts immediately if another instance is
  already running, preventing multiple cars from spawning in the sim.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 16:07:32 -04:00
Paul Huliganga 2d52bb4ffc fix(core): replace exploit bandaids with solid physics barriers + clean reward
Root cause: barriers were zero-thickness MeshCollider planes with no CCD on the
car. The car tunnelled through between frames. Every Python patch was trying to
catch in code what physics should enforce.

Unity (source only — build in progress):
- RoadBuilder.cs: CreateBarrier() now makes BoxCollider-per-segment with real 3D
  volume (barrierThickness=1.0m default) + half-thickness overlap at corners to
  seal gaps. CreateEndCap() seals open ends of non-looping tracks (generated_road).
- Car.cs: rb.collisionDetectionMode = Continuous in Awake() — prevents tunneling.

Python:
- reward_wrapper.py v7: removed CTE-patience termination, high-CTE negative
  reward, solid_hit monitoring, low-speed/wedge detection. Kept: efficiency gate,
  no-progress (active_node) termination, lap exploit guard. Reward = speed×CTE_quality.
- exp23_generated_road_clean.py: single track, no warm-start, 200k steps, clean
  reward, MAX_EPISODE_SECONDS=120 as safety net only.
- tests: 17 tests covering clean reward properties.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 15:56:00 -04:00
Paul Huliganga 138c65270f feat(exp22): add solid-hit/wedge/high-CTE exploit fixes and generated-pair warm experiments
- reward_wrapper: detect barrier/wall/tree solid hits, terminate on head-on impact
  or 4 sustained solid-hit frames; prevents car wedging against invisible barriers
- reward_wrapper: add low-speed/wedge termination — kills episode when car is pinned
  motionless (below threshold, no displacement) after grace period
- reward_wrapper: high-CTE exploit fix — return -0.25 immediately when CTE >
  max_cte_terminate (not after patience), so PPO cannot collect positive speed
  rewards while driving the large outside-road circle
- tests: 23 passing unit tests covering all new termination paths
- exp20/21/22: add parallel DummyVecEnv experiments on generated_road+generated_track
  with warm-start from champion model; exp22 is current active run
- SESSION_HANDOFF.md: live handoff doc for next session continuity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 14:46:13 -04:00
Paul Huliganga 04d5a10992 fix: exp19 — hard episode time limit to stop minutes-long stuck cars
StuckTerminationWrapper wall-clock timer was resettable by barrier-sliding:
car drifting 0.5m along a wall repeatedly resets the 12s timer. At low sim
fps (1-2fps when both cars stuck), 40-step check also takes minutes.

Fix: added max_episode_seconds=30 — hard wall-clock limit per episode,
independent of position or sim fps. No episode can run longer than 30s.

Also adds monitor_training.sh: independent shell process that checks every
5 minutes and appends status to /tmp/training_monitor.log — works without
Claude being active.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 09:18:04 -04:00
Paul Huliganga 7fdfbacaee fix: exp18 — fix circular exploit in parallel training (window=200, min_lap=12s)
Exp 17 post-mortem: efficiency gate window=30 steps only covers ~40% of a
3.5s exploit circle at 22fps, giving partial-arc efficiency ~0.77 (gate fires
at 0.15). Car earned positive reward while circling, outweighing the -10
lap penalty. Performance peaked at 80k then collapsed.

Exp 18 fixes:
- window_size 30→200: covers 2+ full exploit circles, driving efficiency→0
- min_lap_time 5s→12s: genuine laps are 13-16s (gentrack) and 27-29s (mountain);
  anything under 12s is an exploit and terminates immediately

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 09:00:42 -04:00
Paul Huliganga b504b89b2a feat: add exp17 parallel DummyVecEnv 450k training + strategy docs
- exp17_parallel_450k.py: parallel two-track training (generated_track:9091,
  mountain_track:9093), 450k steps, v6 reward, HOST=localhost
- DECISIONS.md: ADR-025 (parallel strategy) and ADR-026 (mountain friction fix)
- docs/STATE.md: updated to April 2026 state with current champions and strategy
- docs/TEST_HISTORY.md: mountain friction fix notes + Exp 17 full design
- outerloop-results: exp14 finetune logs and robust mountain eval results

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 02:42:20 -04:00
Paul Huliganga a8aef52f00 fix: force scene reset before exp15 generated-track warm-start so sim actually loads generated_track 2026-04-20 16:36:00 -04:00
Paul Huliganga 84061c01b2 feat: add cross-track warm-start experiments for mountain->generated and generated->mountain 2026-04-20 16:34:24 -04:00
Paul Huliganga 2b90de2fba fix: import json, use make_env_base in phase switch, and run eval sequentially to avoid second concurrent sim car 2026-04-19 20:37:25 -04:00
Paul Huliganga f3c89116ee fix: exp14 finetune eval uses make_env_base (runtime throttle floor) instead of removed make_env 2026-04-19 20:30:51 -04:00
Paul Huliganga 6c5623e881 fix: exp14 finetune load warm-start model without temp env to prevent second spawned car 2026-04-19 20:24:33 -04:00
Paul Huliganga 0c3a37f877 fix: close temporary loaded_env after loading warm-start model to avoid leaving extra TCP vehicle 2026-04-19 20:17:29 -04:00
Paul Huliganga 38dd5e9b1d fix: ensure lr_schedule callable set when loading warm-start model (use get_schedule_fn) and update optimizer LR 2026-04-19 20:14:35 -04:00
Paul Huliganga eb92d119f9 fix: keep action-space matching by loading model with base throttle 0.2 and applying runtime throttle_floor wrapper for phase1 2026-04-19 20:10:19 -04:00
Paul Huliganga 41d12dede2 fix: load warm-start with original action space (throttle_min=0.2), then switch env for phase1 throttle 2026-04-19 20:09:08 -04:00
Paul Huliganga bc23a316e0 exp14 finetune: warm-start mountain champion, throttle schedule 0.4->0.2, LR=2e-4, checkpoints and evals 2026-04-19 20:08:14 -04:00
Paul Huliganga b1ec14e3cb fix: exp14 — proper track switch via exit_scene before connecting to mountain_track 2026-04-19 19:18:33 -04:00
Paul Huliganga 1405a88699 feat: Exp 14 — mountain_track, v5 reward, lap-based stopping
v5 required for mountain hills (v4 gives zero gradient on hills - documented Exp 1).
Same simple approach as Exp 13 which worked: single track, minimal wrappers,
lap-based stopping. ThrottleClamp + V5Reward only.
2026-04-19 19:15:00 -04:00
Paul Huliganga 5a1693b4ec feat: Exp 13 — generated_track, v4 reward, back to basics (no extra heuristics)
Return to Wave 4 setup that produced Trial 9 (2000/2000 on generated_track).
v4 reward: base x efficiency x speed. Circles give ~0 reward naturally.
No StuckTerminationWrapper, no CTE patience, no progress terminator.
Just ThrottleClamp + V4Reward. Lap-based stopping criterion.
2026-04-19 17:33:17 -04:00
Paul Huliganga 813f888502 fix: reward v6.1 — active_node progress terminator kills circle/stuck exploits
User's insight: a circling car stays near the same track waypoints, so
active_node (sim's track progress indicator) never advances. Track the
maximum active_node reached this episode. If it hasn't increased in
progress_patience=60 steps (~3.3s), terminate.

This catches:
  - Circular driving (active_node oscillates, max never increases)
  - Stuck on cone/barrier (active_node frozen)
  - NOT triggered by: legitimate cornering, slow forward progress, lap resets

On lap completion, active_node wraps to 0 — reset max_node_seen and counter.

Also: Exp 12 — single track mountain training with lap-based stopping criterion.
Train until 3 consecutive laps in eval, not fixed step count.
2026-04-19 17:01:41 -04:00
Paul Huliganga dc563e2b6c fix: exp11d remove progress_patience — grass fix only per ADR-020 2026-04-19 16:18:17 -04:00
Paul Huliganga f730a2e0ba docs: ADR-020/021 + session log — throttle/hill history and grass exploit root cause
Critical facts documented permanently:
- throttle_min=0.5 bakes into action space (too fast for corners)
- throttle_min=0.2 + v5 reward CAN learn hill (proved Exp 9, mountain only 90k)
- Mountain failure in parallel is contamination from grass exploit, not throttle
- Grass exploit root cause: sim determine_episode_over() passes when CTE>16m
- DO NOT confuse mountain rollback with stuck issue
- DO NOT change throttle_min as first response to mountain failure
2026-04-19 16:14:28 -04:00
Paul Huliganga 16bd379e95 feat: Exp 11c — parallel DummyVecEnv + v6 reward, extended to 250k steps 2026-04-19 13:27:38 -04:00
Paul Huliganga 91ce8fc1fa feat: Exp 11b — parallel DummyVecEnv + v6 reward (anti-circle gate) + built-in eval 2026-04-19 12:03:46 -04:00
Paul Huliganga beb04f3ebe fix: reward v6 — efficiency gate prevents circular driving, stuck_steps 80→40
v5 dropped the efficiency term to get gradient signal on hills, but this
re-enabled circular driving (observed in Exp 11). v6 adds efficiency back
as a GATE (not multiplier): if efficiency < 0.15, reward = 0. Otherwise
reward = speed × CTE_quality (same as v5).

Gate vs multiplier: v4 used efficiency as a multiplier which killed gradient
on hills (all terms → 0 simultaneously). v6's gate passes when efficiency
is above threshold (car moving forward, even slowly on hill) and only
blocks when car is truly circling.

Also reduced stuck_steps from 80 to 40 (~2.5s vs ~5s) — user reported
car stuck against barriers for ~10s which is too long with DummyVecEnv.
2026-04-19 12:02:55 -04:00
Paul Huliganga 21addf268e feat: Exp 11 — parallel DummyVecEnv multi-track training (two sim instances) 2026-04-19 11:05:22 -04:00
Paul Huliganga 6e9546cd22 save: all experiment scripts moved from /tmp to agent/experiments/
Scripts in /tmp are lost on reboot and not reproducible.
All experiment scripts now committed to git with README.

Exp5 script was already gone (lost before this fix).
All others (Exp6-Exp10, overnight, wave5, etc.) now preserved.

Rule going forward: scripts saved to agent/experiments/ and committed
BEFORE running, not after.

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-18 21:30:08 -04:00