Commit Graph

9 Commits

Author SHA1 Message Date
Paul Huliganga 45b057e9c1 wave3: autoresearch trial 15 results
Agent: pi
Tests: N/A
Tests-Added: 0
TypeScript: N/A
2026-04-16 08:43:17 -04:00
Paul Huliganga 0505de7e63 wave3: autoresearch trial 10 results
Agent: pi
Tests: N/A
Tests-Added: 0
TypeScript: N/A
2026-04-16 03:31:41 -04:00
Paul Huliganga b00f63dfbc fix: save_dir not in scope inside train_multitrack — crashed every trial
Checkpoint code added save_dir inside train_multitrack() but save_dir
is defined in main(). Every trial since the checkpoint fix was added
crashed with 'name save_dir is not defined' after the first segment,
producing rc=101 and no GP data.

Fix: add save_dir=None parameter to train_multitrack() and pass it
from the main() call site.

This explains why Trials 6-10 in the current run all produced None
results despite appearing to train normally for the first segment.

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-15 22:47:29 -04:00
Paul Huliganga a9eed2faa3 fix: restart with verified config + seed GP with overnight 1943 result
All previous issues:
- Controller was never restarted after cap/checkpoint fixes -> they never ran
- Timeout trials (score=0) were polluting GP data -> removed
- Overnight Trial 3 result (1943 mini_monaco) was unknown to GP -> added

GP now has 5 valid data points including the 1943 score at
lr=0.000685, switch=17499. GP should converge toward longer
switching intervals which produced the only great result.

Verified before relaunch:
- PARAM_SPACE max total_timesteps = 90000 ✓
- Checkpoint saves after every segment ✓
- Rescue eval on timeout ✓
- 102 tests passing ✓

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-15 22:26:53 -04:00
Paul Huliganga e61ebc5b38 fix: prevent trial timeouts losing all data
Two changes:

1. Lower total_timesteps cap: 120k → 90k
   Actual throughput is 16 steps/sec (not 20 as estimated).
   120k steps = 126 min training + 9 min overhead = 135 min > 2hr limit.
   90k steps = 94 min + 8 min overhead = 102 min, safely within limit.

2. Per-segment checkpoint saves in multitrack_runner
   model.save() called after every segment so the latest weights are
   always on disk.  If the runner is killed (timeout/crash/Ctrl+C),
   training data is never completely lost.

3. Timeout rescue eval in wave4_controller
   If JOB_TIMEOUT fires and a checkpoint exists, immediately runs a
   quick mini_monaco eval on the checkpoint so the trial still produces
   a GP data point despite the timeout.

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-15 21:54:50 -04:00
Paul Huliganga 5714a96bfb wave3: autoresearch trial 5 results
Agent: pi
Tests: N/A
Tests-Added: 0
TypeScript: N/A
2026-04-15 17:08:50 -04:00
Paul Huliganga c10e56d894 fix: cap total_timesteps at 120k to prevent 2hr timeout
Trials 3+4 both proposed ~140k steps and hit the 2hr JOB_TIMEOUT,
wasting time and producing no GP data.  At ~20 steps/sec, 120k steps
takes ~100 min, safely within the 2hr limit.

Agent: pi
Tests: 102 passed
Tests-Added: 0
TypeScript: N/A
2026-04-15 16:30:07 -04:00
Paul Huliganga 5d1227833d fix: close short-lap circle exploit and cap segment eval episode length
Two reward hacking behaviours observed during Wave 4 training:

1. Short-lap circle exploit (reported by user, echoes Toni's guardrail hack):
   Model circles at start/finish line completing laps in 1-2 sim-seconds,
   accumulating lap_count indefinitely with no genuine track progress.
   Fix: SpeedRewardWrapper detects lap_count increment; if last_lap_time
   < min_lap_time (5.0s), returns penalty = -10 × (min_lap_time / lap_time).
   A 1-second lap gives -50 penalty. Legitimate 12-second laps unaffected.
   Window size also increased from 30 → 60 to catch slower circles.

2. Non-terminating segment eval episodes:
   evaluate_policy on wide tracks (no barriers) could run indefinitely,
   inflating segment_reward to 200k+. Replaced with manual eval loop
   capped at MAX_EVAL_STEPS=3000 steps.

Phase 4 results cleared (trials 4-6 ran with exploitable reward).

Tests: 4 new reward wrapper tests, 100 total passing.

Agent: pi
Tests: 100 passed
Tests-Added: 4
TypeScript: N/A
2026-04-15 09:06:25 -04:00
Paul Huliganga 1be95b7c82 wave3: autoresearch trial 5 results
Agent: pi
Tests: N/A
Tests-Added: 0
TypeScript: N/A
2026-04-15 07:15:57 -04:00