feat(exp22): add solid-hit/wedge/high-CTE exploit fixes and generated-pair warm experiments

- reward_wrapper: detect barrier/wall/tree solid hits, terminate on head-on impact
  or 4 sustained solid-hit frames; prevents car wedging against invisible barriers
- reward_wrapper: add low-speed/wedge termination — kills episode when car is pinned
  motionless (below threshold, no displacement) after grace period
- reward_wrapper: high-CTE exploit fix — return -0.25 immediately when CTE >
  max_cte_terminate (not after patience), so PPO cannot collect positive speed
  rewards while driving the large outside-road circle
- tests: 23 passing unit tests covering all new termination paths
- exp20/21/22: add parallel DummyVecEnv experiments on generated_road+generated_track
  with warm-start from champion model; exp22 is current active run
- SESSION_HANDOFF.md: live handoff doc for next session continuity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Paul Huliganga 2026-05-05 14:46:13 -04:00
parent 04d5a10992
commit 138c65270f
22 changed files with 3980 additions and 8 deletions

View File

@ -13,6 +13,69 @@ You have full access to the codebase, can run commands, and can modify any file.
--- ---
## Donkeycar RL Simulator Startup Rules
This project repeatedly runs into a Windows Unity PlayerPrefs port collision.
Treat this as a standing instruction for every new session that starts or restarts
the simulator.
- Always run two simulator instances from two separate runtime folders:
- `C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin`
- `C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin - Copy`
- Always set the Unity PlayerPrefs registry port before launching each instance,
and also pass explicit ports on launch. Do not rely on the simulator default
port or saved UI settings.
- Launch the main folder with `--port 9091`.
- Launch the copy folder with `--port 9093`.
- Preferred runtime layout:
- main process: `9091`, private API `9092`
- copy process: `9093`, private API `9094`
- After launch, verify sockets from WSL/Linux before running diagnostics or RL:
```bash
python3 - <<'PY'
import socket
for p in (9091, 9093):
s = socket.socket()
s.settimeout(2)
try:
s.connect(("127.0.0.1", p))
print(f"PORT {p}: OK")
except Exception as e:
print(f"PORT {p}: FAIL {e}")
finally:
s.close()
PY
```
Correct PowerShell launch sequence:
```powershell
$key = 'HKCU:\Software\DonkeyCar\donkey_sim'
Get-Process donkey_sim -ErrorAction SilentlyContinue | Stop-Process -Force
Start-Sleep -Seconds 1
Set-ItemProperty -Path $key -Name 'port_h2088097884' -Value 9091 -Type DWord
Set-ItemProperty -Path $key -Name 'portPrivateAPI_h1325370089' -Value 9092 -Type DWord
Start-Process -FilePath 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin\donkey_sim.exe' -ArgumentList '--port','9091' -WorkingDirectory 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin'
Start-Sleep -Seconds 4
Set-ItemProperty -Path $key -Name 'port_h2088097884' -Value 9093 -Type DWord
Set-ItemProperty -Path $key -Name 'portPrivateAPI_h1325370089' -Value 9094 -Type DWord
Start-Process -FilePath 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin - Copy\donkey_sim.exe' -ArgumentList '--port','9093' -WorkingDirectory 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin - Copy'
```
Why: Unity stores the simulator port in Windows PlayerPrefs/registry under the
shared `DonkeyCar/donkey_sim` product key, so both copied simulator folders can
inherit the same saved port. Command-line `--port` binds the server correctly,
but the in-sim UI can still display the saved PlayerPrefs value. Setting
PlayerPrefs before each launch makes both the displayed port and the bound port
line up.
---
## Core Loop ## Core Loop
Every time you start, follow this exact sequence: Every time you start, follow this exact sequence:

249
agent/SESSION_HANDOFF.md Normal file
View File

@ -0,0 +1,249 @@
# RL Donkeycar Session Handoff
Last updated: 2026-05-05 America/Toronto
## Autonomy Instruction
Use this as the standing instruction for follow-on sessions:
`Continue the Donkeycar RL/sim work autonomously. Rebuild, sync, relaunch, run diagnostics, patch code, and restart experiments as needed. Keep going until you either have a verified fix and a running experiment, or a concrete blocker that truly requires the user. Do not stop just to ask for permission on ordinary reversible steps. Only pause for real risk of data loss, destructive actions, missing credentials/access, or major strategy tradeoffs that require a user decision.`
If the user says only `continue`, interpret it using the instruction above.
## Current Goal
Stabilize the Unity simulator geometry and collision behavior enough that:
- `generated_road` and `generated_track` both run without bad invisible barrier placement
- barrier contacts terminate episodes appropriately
- RL can restart from a trustworthy simulator build
## Important Paths
Project:
- `/home/paulh/projects/donkeycar-rl-autoresearch`
Unity source project:
- `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim`
Unity build output:
- `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Builds/DonkeySimWin`
Current runtime simulator folders in use:
- `/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin`
- `/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin - Copy`
## Current RL Experiment Files
- `agent/experiments/exp21_generated_pair_warm_v4.py`
- `agent/experiments/exp22_generated_pair_warm_v6.py`
Latest model/output folder:
- `agent/models/exp22-generated-pair-warm-v6`
Current training run:
- launched `agent/experiments/exp22_generated_pair_warm_v6.py`
- PID file: `agent/models/exp22-generated-pair-warm-v6/current.pid`
- current PID at launch time: `609054`
- log: `agent/models/exp22-generated-pair-warm-v6/run_2026-05-05_141929_strictcte.log`
- startup verified: connected to `localhost:9091` and `localhost:9093`, loaded `generated_road` and `generated_track`, attached warm-start model, reached `Starting training...`
Latest urgent exploit fix:
- User observed generated_road still doing the large outside circle exploit.
- Stopped the previous run immediately.
- Patched `agent/reward_wrapper.py` so high CTE receives negative reward immediately during the patience window instead of falling through to positive speed reward.
- Patched `agent/experiments/exp22_generated_pair_warm_v6.py`:
- `MAX_CTE_TERMINATE = 2.5`
- `CTE_PATIENCE = 3`
- Added regression test `test_high_cte_never_gets_positive_speed_reward_before_termination`.
- Verified `python3 -m pytest -q tests/test_reward_wrapper.py`: `21 passed`.
## What Was Learned
### Training status
The latest meaningful `exp22` run was poor and should not be resumed as-is.
From `agent/models/exp22-generated-pair-warm-v6/run_2026-04-28_2132_openfix.log`:
- best `generated_track` eval reached only about `92` steps
- run was not trustworthy due to ongoing barrier-placement concerns
### Simulator behavior
- Invisible barriers are collider-only by default, so the user cannot see them in the standalone player
- Diagnostic probe showed both tracks could advance from the start before hitting `left_barrier`, so there was no obvious full-width blocker across the road start
- User screenshot suggested the car was getting trapped near the shoulder/edge, consistent with barrier corridor too close to the drivable edge
- User also reported that barrier contact sometimes blocks the car without promptly ending the episode
### Collision semantics
The user does **not** want every barrier brush to terminate the episode.
Desired behavior:
- light brush: can continue
- sustained contact: terminate
- head-on / abrupt stop: terminate quickly
## Code Changes Already Made
### Unity / simulator side
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/RoadBuilder.cs`
Implemented structural refactor:
- explicit `closeLoop` support
- explicit road-edge generation
- barrier edges derived from left/right road edges instead of guessed centerline offset
- open tracks do not force wraparound
- debug polyline support via gizmos
Added runtime-visible debug barrier support:
- `showBarrierMeshes`
- `barrierDebugColor`
- barrier objects now include `MeshFilter`
- optional `MeshRenderer` added for visible translucent barriers
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scenes/generated_road.unity`
- `closeLoop = 0`
- `doAddBarriers = 1`
- `showBarrierMeshes = 1`
- pinned road variation arrays to one entry
- `roadOffsets.Array.data[0] = 2.2`
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scenes/generated_track.unity`
- `showBarrierMeshes = 1`
- `roadOffsetW = 2.2`
- barriers still enabled
### Python / RL side
`/home/paulh/projects/donkeycar-rl-autoresearch/agent/reward_wrapper.py`
Latest intent:
- do **not** terminate instantly on every barrier hit
- terminate on sustained obstacle contact
- terminate on head-on style stop
Current patch in file:
- tracks `_solid_hit_steps`
- tracks `_prev_speed`
- classifies solid hits via `hit` containing `barrier`, `wall`, or `tree`
- immediate terminate on abrupt speed collapse while colliding
- terminate after several consecutive solid-hit frames
This was meant to replace the too-aggressive “any barrier hit = immediate death” logic.
## Most Recent Verified Build Status
Unity batch build for the debug-visible barrier version completed successfully.
Evidence:
- build log ended with `Exiting batchmode successfully now!`
- return code `0`
The successful build has now been synced into both `Downloads` runtime folders and both simulators have been relaunched.
Current verified runtime state:
- main folder process owns port `9091`
- main folder also owns private API port `9092`
- copy folder process owns port `9093`
- copy folder also owns private API port `9094`
- Linux socket probe reported `PORT 9091: OK`, `PORT 9092: OK`, `PORT 9093: OK`, and `PORT 9094: OK`
- latest runtime build includes double-sided barrier mesh triangles for visual/debug barrier rendering
Note: the Windows profile uses shared Unity PlayerPrefs/registry values under `HKCU:\Software\DonkeyCar\donkey_sim`. Explicit `--port` args bind the servers correctly, but the in-sim UI can still show the saved PlayerPrefs value. Before launch, set `port_h2088097884`/`portPrivateAPI_h1325370089` to `9091`/`9092`, start the main sim, then set them to `9093`/`9094` and start the copy. Also keep passing explicit `--port 9091` and `--port 9093`.
Latest user visual inspection before double-sided patch:
- `generated_road`: barriers visible on both sides except missing on left side at the very start before the first curve
- `generated_track`: barrier visible only on the right/inside side when driving clockwise; no visible left/outside barrier
Likely diagnosis: barrier mesh was generated as a single-sided vertical plane and the Standard shader culled backfaces, so some debug barrier surfaces existed but were invisible from the road/camera side.
Latest simulator-side patch:
- `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/RoadBuilder.cs`
- `CreateBarrier(...)` now emits reverse-facing triangles for every barrier quad, making debug barrier meshes visible from both sides
- failed attempt: `Unlit/Transparent` made both tracks' barriers black in the standalone player
- failed attempt: duplicating reverse-facing triangles made `generated_track` barriers black, likely due coplanar transparent overdraw/z-fighting on the closed/scaled track
- current debug barrier mesh is back to one triangle set per quad; material uses `Standard` transparent mode with forced pale fallback color, alpha blend, culling off, and emission enabled so barriers should stay light/translucent while remaining visible from both sides
- Unity Windows batch build succeeded after this patch
- rebuilt output synced to both runtime folders and relaunched with explicit ports
## Immediate Next Steps
1. Monitor current exp22 training log/checkpoints.
2. Determine:
- are barriers too close to the road edge globally?
- or only wrong at specific bends / first-corner geometry?
3. Fix geometry if needed before restarting RL.
4. Only after geometry is visually verified, restart `exp22` or a successor experiment.
## Useful Commands
### Sync latest build into runtime folders
```bash
rsync -a --delete '/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Builds/DonkeySimWin/' '/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin/'
rsync -a --delete '/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Builds/DonkeySimWin/' '/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin - Copy/'
```
### Launch sims from Windows side
```powershell
$key = 'HKCU:\Software\DonkeyCar\donkey_sim'
Set-ItemProperty -Path $key -Name 'port_h2088097884' -Value 9091 -Type DWord
Set-ItemProperty -Path $key -Name 'portPrivateAPI_h1325370089' -Value 9092 -Type DWord
Start-Process -FilePath 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin\donkey_sim.exe' -ArgumentList '--port','9091' -WorkingDirectory 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin'
Start-Sleep -Seconds 4
Set-ItemProperty -Path $key -Name 'port_h2088097884' -Value 9093 -Type DWord
Set-ItemProperty -Path $key -Name 'portPrivateAPI_h1325370089' -Value 9094 -Type DWord
Start-Process -FilePath 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin - Copy\donkey_sim.exe' -ArgumentList '--port','9093' -WorkingDirectory 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin - Copy'
```
### Verify ports
```bash
python3 - <<'PY'
import socket
for p in (9091, 9093):
s = socket.socket()
s.settimeout(3)
try:
s.connect(('127.0.0.1', p))
print(f'PORT {p}: OK')
except Exception as e:
print(f'PORT {p}: FAIL {e}')
finally:
s.close()
PY
```
## Notes for Next Session
- If the user says `continue`, do not ask broad questions. Start with the immediate next steps above.
- Prefer direct verification over more RL training.
- Do not restart long training until the user has visually confirmed the debug-visible barriers look correct.

View File

@ -0,0 +1,205 @@
"""
Exp 20: Parallel DummyVecEnv 450k steps, rebuilt sim (v5).
Fixes from Exp 19 (v4 v5):
- progress_patience: 60 150 steps.
Mountain track hills slow the car to near-throttle-min speed. At ~1 m/s
going uphill, the nearest waypoint may not advance for 3-7 seconds. The
previous 60-step (~3s) limit caused legitimate uphill driving to be
terminated as "no progress". 150 steps (~7.5s at 20fps) covers the
longest mountain hill sections without being exploitable.
New sim fixes (require rebuilt donkey_sim.exe rebuild done before this run):
- Car.cs OnCollisionStay: sustained low-speed barrier/tree contact now
keeps hit != "none" so the sim terminates the episode immediately.
Previously, hit was cleared every frame so wedged cars ran indefinitely.
- RoadBuilder invisible barriers: generated_track now has invisible wall
meshes on both sides of the road. Car cannot escape through mesh gaps.
Barriers are 3m tall, 0.3m outside the road edge, loop closed at start/finish.
Everything else identical to Exp 19.
Setup TWO rebuilt sim instances required:
Sim 1: donkey_sim.exe on port 9091 generated_track
Sim 2: separate copy of donkey_sim.exe on port 9093 mountain_track
"""
import sys, os, time
sys.path.insert(0, '/home/paulh/projects/donkeycar-rl-autoresearch/agent')
from multitrack_runner import log, StuckTerminationWrapper
from donkeycar_sb3_runner import ThrottleClampWrapper
from reward_wrapper import SpeedRewardWrapper
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage
import gymnasium as gym
import numpy as np
HOST = 'localhost'
THROTTLE_MIN = 0.2
LR = 0.000725
TOTAL_STEPS = 450_000
CHECKPOINT_EVERY = 20_000
SAVE_DIR = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5'
os.makedirs(SAVE_DIR, exist_ok=True)
EFFICIENCY_WINDOW = 200
MIN_LAP_TIME = 12.0
PROGRESS_PATIENCE = 150 # was 60 — mountain hills take up to 7s per waypoint
def make_env(track_id, port):
def _init():
raw = gym.make(track_id, conf={'host': HOST, 'port': port})
env = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
env = StuckTerminationWrapper(env, stuck_steps=40, min_displacement=0.5,
max_episode_seconds=30.0)
env = SpeedRewardWrapper(env, window_size=EFFICIENCY_WINDOW,
min_lap_time=MIN_LAP_TIME,
progress_patience=PROGRESS_PATIENCE)
return env
return _init
log('=' * 60)
log('Exp 20: Parallel DummyVecEnv — 450k steps (sim rebuild + progress fix)')
log(f' Sim 1: {HOST}:9091 → generated_track')
log(f' Sim 2: {HOST}:9093 → mountain_track')
log(f' throttle_min={THROTTLE_MIN}, lr={LR}, total={TOTAL_STEPS:,}')
log(f' Reward: v6 + exploit fix (window={EFFICIENCY_WINDOW}, min_lap={MIN_LAP_TIME}s)')
log(f' Stuck termination: 40 steps (~2s), hard cap 30s')
log(f' Progress patience: {PROGRESS_PATIENCE} steps (~7.5s at 20fps)')
log(f' Checkpoints: every {CHECKPOINT_EVERY:,} steps')
log('=' * 60)
log('Creating DummyVecEnv with two tracks...')
env = DummyVecEnv([
make_env('donkey-generated-track-v0', 9091),
make_env('donkey-mountain-track-v0', 9093),
])
env = VecTransposeImage(env)
log(f' VecEnv num_envs={env.num_envs}, obs={env.observation_space.shape}')
model = PPO('CnnPolicy', env, learning_rate=LR, verbose=1, device='cpu')
log('PPO created. Starting training...')
best_reward = float('-inf')
steps_done = 0
while steps_done < TOTAL_STEPS:
seg_steps = min(CHECKPOINT_EVERY, TOTAL_STEPS - steps_done)
model.learn(total_timesteps=seg_steps, reset_num_timesteps=False)
steps_done += seg_steps
ckpt = os.path.join(SAVE_DIR, f'checkpoint_{steps_done:07d}')
model.save(ckpt)
model.save(os.path.join(SAVE_DIR, 'model'))
log(f'[{steps_done:,}/{TOTAL_STEPS:,}] Checkpoint saved: {ckpt}.zip')
try:
obs = env.reset()
ep_rewards = np.zeros(env.num_envs)
ep_steps = np.zeros(env.num_envs)
done_mask = np.zeros(env.num_envs, dtype=bool)
for _ in range(2000):
action, _ = model.predict(obs, deterministic=True)
obs, rewards, dones, infos = env.step(action)
for i in range(env.num_envs):
if not done_mask[i]:
ep_rewards[i] += rewards[i]
ep_steps[i] += 1
if dones[i]:
done_mask[i] = True
if done_mask.all():
break
status0 = '' if ep_steps[0] >= 2000 else f'❌@{int(ep_steps[0])}'
status1 = '' if ep_steps[1] >= 2000 else f'❌@{int(ep_steps[1])}'
log(f' Eval: gen_track={ep_rewards[0]:.1f}r/{int(ep_steps[0])}s {status0} '
f'mountain={ep_rewards[1]:.1f}r/{int(ep_steps[1])}s {status1}')
total_reward = ep_rewards.sum()
if total_reward > best_reward:
best_reward = total_reward
model.save(os.path.join(SAVE_DIR, 'best_model'))
log(f' NEW BEST: {best_reward:.1f} combined reward')
except Exception as e:
log(f' Eval error: {e}')
import traceback; traceback.print_exc()
model.save(os.path.join(SAVE_DIR, 'model'))
log(f'\nTraining complete. Best combined reward: {best_reward:.1f}')
env.close()
time.sleep(5)
# --- Final eval on all 4 tracks (sequential, port 9091) ---
log('\n' + '=' * 60)
log('FINAL EVALUATION: best_model on 4 tracks (3 sets each)')
log('=' * 60)
EVAL_TRACKS = [
('donkey-generated-track-v0', 'generated_track'),
('donkey-mountain-track-v0', 'mountain_track'),
('donkey-minimonaco-track-v0', 'mini_monaco'),
('donkey-generated-roads-v0', 'generated_road'),
]
EVAL_PORT = 9091
EVAL_SETS = 3
EVAL_MAX_STEPS = 2000
best_model_path = os.path.join(SAVE_DIR, 'best_model.zip')
results_by_track = {}
for track_id, track_name in EVAL_TRACKS:
log(f'\n--- {track_name} ---')
steps_list = []
for s in range(1, EVAL_SETS + 1):
try:
raw = gym.make(track_id, conf={'host': HOST, 'port': EVAL_PORT})
inner = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
inner = StuckTerminationWrapper(inner, stuck_steps=40, min_displacement=0.5)
inner = SpeedRewardWrapper(inner)
eval_env = VecTransposeImage(DummyVecEnv([lambda e=inner: e]))
eval_model = PPO.load(best_model_path, env=eval_env, device='cpu')
obs = eval_env.reset()
total_r, total_s, done = 0.0, 0, False
while not done and total_s < EVAL_MAX_STEPS:
action, _ = eval_model.predict(obs, deterministic=True)
result = eval_env.step(action)
if len(result) == 4:
obs, r, d, info = result
done = bool(d[0])
else:
obs, r, t, tr, info = result
done = bool(t[0] or tr[0])
total_r += float(r[0])
total_s += 1
status = '' if total_s >= EVAL_MAX_STEPS else f'❌@{total_s}'
log(f' Set {s}: {total_r:.1f}r / {total_s}s {status}')
steps_list.append(total_s)
eval_env.close()
time.sleep(3)
except Exception as e:
log(f' Set {s}: ERROR — {e}')
steps_list.append(0)
time.sleep(3)
mean_steps = np.mean(steps_list) if steps_list else 0
results_by_track[track_name] = steps_list
log(f' Mean: {mean_steps:.0f} steps')
log('\n' + '=' * 60)
log('SUMMARY')
log('=' * 60)
for track_name, steps_list in results_by_track.items():
steps_str = '/'.join(str(s) for s in steps_list)
mean = np.mean(steps_list)
verdict = '' if mean >= 1500 else '⚠️' if mean >= 500 else ''
log(f' {verdict} {track_name:20s}: {steps_str} mean={mean:.0f}')
log(f'\n=== Exp 20 COMPLETE ===')

View File

@ -0,0 +1,291 @@
"""
Exp 21: Parallel DummyVecEnv generated_road + generated_track, warm-started.
Rationale:
- generated_road specialist already exists and drives road markings well.
- generated_road and generated_track share the same road semantics.
- Background adaptation is the goal here, not mountain physics.
Design:
- Warm-start from Phase 2 champion (generated_road specialist).
- Train in parallel on TWO sim instances:
Sim 1: generated_road on port 9091
Sim 2: generated_track on port 9093
- Use the old v4 reward that worked for the flat road tracks.
- Keep the wrapper chain minimal: ThrottleClamp + V4 reward only.
"""
import sys, os, time
from collections import deque
from datetime import datetime
sys.path.insert(0, '/home/paulh/projects/donkeycar-rl-autoresearch/agent')
from donkeycar_sb3_runner import ThrottleClampWrapper
from multitrack_runner import StuckTerminationWrapper
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage
from stable_baselines3.common.utils import get_schedule_fn
import gymnasium as gym
import numpy as np
HOST = 'localhost'
THROTTLE_MIN = 0.2
LR = 0.000225
TOTAL_STEPS = 150_000
CHECKPOINT_EVERY = 10_000
SAVE_DIR = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp21-generated-pair-warm-v4'
WARM_PATH = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip'
os.makedirs(SAVE_DIR, exist_ok=True)
class V4RewardWrapper(gym.Wrapper):
"""
v4 reward from the successful flat-road experiments:
reward = base_cte * efficiency * (1 + speed_scale * speed)
"""
def __init__(self, env, speed_scale=0.1, window_size=60,
min_efficiency=0.05, max_cte=8.0):
super().__init__(env)
self.speed_scale = speed_scale
self.min_efficiency = min_efficiency
self.max_cte = max_cte
self._pos_history = deque(maxlen=window_size + 1)
def reset(self, **kwargs):
self._pos_history.clear()
return self.env.reset(**kwargs)
def step(self, action):
result = self.env.step(action)
if len(result) == 5:
obs, _sim_reward, terminated, truncated, info = result
done = terminated or truncated
else:
obs, _sim_reward, done, info = result
terminated, truncated = done, False
reward = self._compute_reward(done, info)
if len(result) == 5:
return obs, reward, terminated, truncated, info
return obs, reward, done, info
def _compute_reward(self, done, info):
if done:
return -1.0
pos = info.get('pos', None)
if pos is not None:
try:
self._pos_history.append(np.array(list(pos)[:3], dtype=np.float64))
except (TypeError, ValueError):
pass
try:
cte = float(info.get('cte', 0.0) or 0.0)
except (TypeError, ValueError):
cte = 0.0
base = 1.0 - min(abs(cte) / self.max_cte, 1.0)
efficiency = self._compute_efficiency()
eff = max(0.0, (efficiency - self.min_efficiency) / (1.0 - self.min_efficiency))
try:
speed = max(0.0, float(info.get('speed', 0.0) or 0.0))
except (TypeError, ValueError):
speed = 0.0
return base * eff * (1.0 + self.speed_scale * speed)
def _compute_efficiency(self):
if len(self._pos_history) < 3:
return 1.0
positions = list(self._pos_history)
net = np.linalg.norm(positions[-1] - positions[0])
total = sum(
np.linalg.norm(positions[i + 1] - positions[i])
for i in range(len(positions) - 1)
)
return float(net / total) if total > 1e-6 else 1.0
def log(msg):
print(f'[{datetime.now().strftime("%H:%M:%S")}] {msg}', flush=True)
def make_env(track_id, port):
def _init():
raw = gym.make(track_id, conf={'host': HOST, 'port': port})
env = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
env = StuckTerminationWrapper(
env,
stuck_steps=40,
min_displacement=0.5,
max_stuck_seconds=12.0,
max_episode_seconds=30.0,
)
env = V4RewardWrapper(env, speed_scale=0.1, window_size=60,
min_efficiency=0.05, max_cte=8.0)
return env
return _init
def make_eval_env(track_id, port):
inner = make_env(track_id, port)()
return VecTransposeImage(DummyVecEnv([lambda e=inner: e]))
log('=' * 60)
log('Exp 21: generated_road + generated_track, warm-started, v4 reward')
log(f' Warm start: {WARM_PATH}')
log(f' Sim 1: {HOST}:9091 -> generated_road')
log(f' Sim 2: {HOST}:9093 -> generated_track')
log(f' throttle_min={THROTTLE_MIN}, lr={LR}, total={TOTAL_STEPS:,}')
log(' Termination: StuckTerminationWrapper enabled')
log(f' Checkpoints: every {CHECKPOINT_EVERY:,} steps')
log('=' * 60)
log('Creating DummyVecEnv with the two road tracks...')
env = DummyVecEnv([
make_env('donkey-generated-roads-v0', 9091),
make_env('donkey-generated-track-v0', 9093),
])
env = VecTransposeImage(env)
log(f' VecEnv num_envs={env.num_envs}, obs={env.observation_space.shape}')
if not os.path.exists(WARM_PATH):
raise FileNotFoundError(WARM_PATH)
model = PPO.load(WARM_PATH, env=env, device='cpu')
model.learning_rate = LR
try:
model.lr_schedule = get_schedule_fn(LR)
except Exception:
model.lr_schedule = None
try:
for pg in model.policy.optimizer.param_groups:
pg['lr'] = LR
except Exception:
pass
log('Warm-start model attached. Starting training...')
best_total_steps = float('-inf')
best_total_reward = float('-inf')
steps_done = 0
while steps_done < TOTAL_STEPS:
seg_steps = min(CHECKPOINT_EVERY, TOTAL_STEPS - steps_done)
model.learn(total_timesteps=seg_steps, reset_num_timesteps=False)
steps_done += seg_steps
ckpt = os.path.join(SAVE_DIR, f'checkpoint_{steps_done:07d}')
model.save(ckpt)
model.save(os.path.join(SAVE_DIR, 'model'))
log(f'[{steps_done:,}/{TOTAL_STEPS:,}] Checkpoint saved: {ckpt}.zip')
try:
obs = env.reset()
ep_rewards = np.zeros(env.num_envs)
ep_steps = np.zeros(env.num_envs)
done_mask = np.zeros(env.num_envs, dtype=bool)
for _ in range(2000):
action, _ = model.predict(obs, deterministic=True)
obs, rewards, dones, infos = env.step(action)
for i in range(env.num_envs):
if not done_mask[i]:
ep_rewards[i] += rewards[i]
ep_steps[i] += 1
if dones[i]:
done_mask[i] = True
if done_mask.all():
break
status0 = '' if ep_steps[0] >= 2000 else f'❌@{int(ep_steps[0])}'
status1 = '' if ep_steps[1] >= 2000 else f'❌@{int(ep_steps[1])}'
log(f' Eval: gen_road={ep_rewards[0]:.1f}r/{int(ep_steps[0])}s {status0} '
f'gen_track={ep_rewards[1]:.1f}r/{int(ep_steps[1])}s {status1}')
total_steps_eval = ep_steps.sum()
total_reward = ep_rewards.sum()
if (total_steps_eval > best_total_steps or
(total_steps_eval == best_total_steps and total_reward > best_total_reward)):
best_total_steps = total_steps_eval
best_total_reward = total_reward
model.save(os.path.join(SAVE_DIR, 'best_model'))
log(f' NEW BEST: combined steps={int(best_total_steps)} reward={best_total_reward:.1f}')
except Exception as e:
log(f' Eval error: {e}')
import traceback; traceback.print_exc()
model.save(os.path.join(SAVE_DIR, 'model'))
log(f'\nTraining complete. Best combined steps: {int(best_total_steps)}')
env.close()
time.sleep(5)
log('\n' + '=' * 60)
log('FINAL EVALUATION: best_model on generated_road, generated_track, mini_monaco')
log('=' * 60)
EVAL_TRACKS = [
('donkey-generated-roads-v0', 'generated_road'),
('donkey-generated-track-v0', 'generated_track'),
('donkey-minimonaco-track-v0', 'mini_monaco'),
]
EVAL_PORT = 9091
EVAL_SETS = 3
EVAL_MAX_STEPS = 2000
best_model_path = os.path.join(SAVE_DIR, 'best_model.zip')
results_by_track = {}
for track_id, track_name in EVAL_TRACKS:
log(f'\n--- {track_name} ---')
steps_list = []
for s in range(1, EVAL_SETS + 1):
try:
eval_env = make_eval_env(track_id, EVAL_PORT)
eval_model = PPO.load(best_model_path, env=eval_env, device='cpu')
obs = eval_env.reset()
total_r, total_s, done = 0.0, 0, False
while not done and total_s < EVAL_MAX_STEPS:
action, _ = eval_model.predict(obs, deterministic=True)
result = eval_env.step(action)
if len(result) == 4:
obs, r, d, info = result
done = bool(d[0])
else:
obs, r, t, tr, info = result
done = bool(t[0] or tr[0])
total_r += float(r[0])
total_s += 1
status = '' if total_s >= EVAL_MAX_STEPS else f'❌@{total_s}'
log(f' Set {s}: {total_r:.1f}r / {total_s}s {status}')
steps_list.append(total_s)
eval_env.close()
time.sleep(3)
except Exception as e:
log(f' Set {s}: ERROR - {e}')
steps_list.append(0)
time.sleep(3)
mean_steps = np.mean(steps_list) if steps_list else 0
results_by_track[track_name] = steps_list
log(f' Mean: {mean_steps:.0f} steps')
log('\n' + '=' * 60)
log('SUMMARY')
log('=' * 60)
for track_name, steps_list in results_by_track.items():
steps_str = '/'.join(str(s) for s in steps_list)
mean = np.mean(steps_list)
verdict = '' if mean >= 1500 else '⚠️' if mean >= 500 else ''
log(f' {verdict} {track_name:20s}: {steps_str} mean={mean:.0f}')
log('\n=== Exp 21 COMPLETE ===')

View File

@ -0,0 +1,258 @@
"""
Exp 22: Parallel DummyVecEnv generated_road + generated_track, warm-started.
Purpose:
- Keep the generated_road champion warm-start idea.
- Use the full termination stack so wedged cars and circular exploits end fast.
- Use the v6 reward wrapper, which explicitly kills no-progress / low-efficiency
behaviour instead of merely giving it weak reward.
Setup:
- Sim 1: generated_road on port 9091
- Sim 2: generated_track on port 9093
- Warm-start from agent/models/champion/model.zip
"""
import os
import sys
import time
from datetime import datetime
sys.path.insert(0, '/home/paulh/projects/donkeycar-rl-autoresearch/agent')
import gymnasium as gym
import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.utils import get_schedule_fn
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage
from donkeycar_sb3_runner import ThrottleClampWrapper
from multitrack_runner import StuckTerminationWrapper
from reward_wrapper import SpeedRewardWrapper
HOST = 'localhost'
THROTTLE_MIN = 0.2
LR = 0.000225
TOTAL_STEPS = 150_000
CHECKPOINT_EVERY = 10_000
SAVE_DIR = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6'
WARM_PATH = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip'
os.makedirs(SAVE_DIR, exist_ok=True)
EFFICIENCY_WINDOW = 60
MIN_EFFICIENCY = 0.15
MIN_LAP_TIME = 12.0
MAX_CTE_TERMINATE = 2.5
CTE_PATIENCE = 3
PROGRESS_PATIENCE = 100
EFFICIENCY_PATIENCE = 12
LOW_SPEED_PATIENCE = 10
LOW_SPEED_THRESHOLD = 0.25
LOW_SPEED_MIN_DISPLACEMENT = 0.20
LOW_SPEED_GRACE_STEPS = 15
MAX_STUCK_SECONDS = 3.0
MAX_EPISODE_SECONDS = 18.0
def log(msg):
print(f'[{datetime.now().strftime("%H:%M:%S")}] {msg}', flush=True)
def make_env(track_id, port):
def _init():
raw = gym.make(track_id, conf={'host': HOST, 'port': port})
env = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
env = StuckTerminationWrapper(
env,
stuck_steps=40,
min_displacement=0.5,
max_stuck_seconds=MAX_STUCK_SECONDS,
max_episode_seconds=MAX_EPISODE_SECONDS,
)
env = SpeedRewardWrapper(
env,
window_size=EFFICIENCY_WINDOW,
min_efficiency=MIN_EFFICIENCY,
min_lap_time=MIN_LAP_TIME,
max_cte_terminate=MAX_CTE_TERMINATE,
cte_patience=CTE_PATIENCE,
progress_patience=PROGRESS_PATIENCE,
efficiency_patience=EFFICIENCY_PATIENCE,
low_speed_patience=LOW_SPEED_PATIENCE,
low_speed_threshold=LOW_SPEED_THRESHOLD,
low_speed_min_displacement=LOW_SPEED_MIN_DISPLACEMENT,
low_speed_grace_steps=LOW_SPEED_GRACE_STEPS,
)
return env
return _init
def make_eval_env(track_id, port):
inner = make_env(track_id, port)()
return VecTransposeImage(DummyVecEnv([lambda e=inner: e]))
log('=' * 60)
log('Exp 22: generated_road + generated_track, warm-started, v6 reward')
log(f' Warm start: {WARM_PATH}')
log(f' Sim 1: {HOST}:9091 -> generated_road')
log(f' Sim 2: {HOST}:9093 -> generated_track')
log(f' throttle_min={THROTTLE_MIN}, lr={LR}, total={TOTAL_STEPS:,}')
log(' Reward: v6 (speed x CTE with progress/efficiency exploit termination)')
log(f' Stuck timeout: {MAX_STUCK_SECONDS:.1f}s, hard cap: {MAX_EPISODE_SECONDS:.1f}s')
log(f' Progress patience: {PROGRESS_PATIENCE} steps')
log(f' Checkpoints: every {CHECKPOINT_EVERY:,} steps')
log('=' * 60)
log('Creating DummyVecEnv with the two road tracks...')
env = DummyVecEnv([
make_env('donkey-generated-roads-v0', 9091),
make_env('donkey-generated-track-v0', 9093),
])
env = VecTransposeImage(env)
log(f' VecEnv num_envs={env.num_envs}, obs={env.observation_space.shape}')
if not os.path.exists(WARM_PATH):
raise FileNotFoundError(WARM_PATH)
model = PPO.load(WARM_PATH, env=env, device='cpu')
model.learning_rate = LR
try:
model.lr_schedule = get_schedule_fn(LR)
except Exception:
model.lr_schedule = None
try:
for pg in model.policy.optimizer.param_groups:
pg['lr'] = LR
except Exception:
pass
log('Warm-start model attached. Starting training...')
best_total_steps = float('-inf')
best_total_reward = float('-inf')
steps_done = 0
while steps_done < TOTAL_STEPS:
seg_steps = min(CHECKPOINT_EVERY, TOTAL_STEPS - steps_done)
model.learn(total_timesteps=seg_steps, reset_num_timesteps=False)
steps_done += seg_steps
ckpt = os.path.join(SAVE_DIR, f'checkpoint_{steps_done:07d}')
model.save(ckpt)
model.save(os.path.join(SAVE_DIR, 'model'))
log(f'[{steps_done:,}/{TOTAL_STEPS:,}] Checkpoint saved: {ckpt}.zip')
try:
obs = env.reset()
ep_rewards = np.zeros(env.num_envs)
ep_steps = np.zeros(env.num_envs)
done_mask = np.zeros(env.num_envs, dtype=bool)
for _ in range(2000):
action, _ = model.predict(obs, deterministic=True)
obs, rewards, dones, infos = env.step(action)
for i in range(env.num_envs):
if not done_mask[i]:
ep_rewards[i] += rewards[i]
ep_steps[i] += 1
if dones[i]:
done_mask[i] = True
if done_mask.all():
break
status0 = '' if ep_steps[0] >= 2000 else f'❌@{int(ep_steps[0])}'
status1 = '' if ep_steps[1] >= 2000 else f'❌@{int(ep_steps[1])}'
log(
f' Eval: gen_road={ep_rewards[0]:.1f}r/{int(ep_steps[0])}s {status0} '
f'gen_track={ep_rewards[1]:.1f}r/{int(ep_steps[1])}s {status1}'
)
total_steps_eval = ep_steps.sum()
total_reward = ep_rewards.sum()
if (
total_steps_eval > best_total_steps
or (total_steps_eval == best_total_steps and total_reward > best_total_reward)
):
best_total_steps = total_steps_eval
best_total_reward = total_reward
model.save(os.path.join(SAVE_DIR, 'best_model'))
log(
f' NEW BEST: combined steps={int(best_total_steps)} '
f'reward={best_total_reward:.1f}'
)
except Exception as e:
log(f' Eval error: {e}')
import traceback
traceback.print_exc()
model.save(os.path.join(SAVE_DIR, 'model'))
log(f'\nTraining complete. Best combined steps: {int(best_total_steps)}')
env.close()
time.sleep(5)
log('\n' + '=' * 60)
log('FINAL EVALUATION: best_model on generated_road, generated_track, mini_monaco')
log('=' * 60)
EVAL_TRACKS = [
('donkey-generated-roads-v0', 'generated_road'),
('donkey-generated-track-v0', 'generated_track'),
('donkey-minimonaco-track-v0', 'mini_monaco'),
]
EVAL_PORT = 9091
EVAL_SETS = 3
EVAL_MAX_STEPS = 2000
best_model_path = os.path.join(SAVE_DIR, 'best_model.zip')
results_by_track = {}
for track_id, track_name in EVAL_TRACKS:
log(f'\n--- {track_name} ---')
steps_list = []
for s in range(1, EVAL_SETS + 1):
try:
eval_env = make_eval_env(track_id, EVAL_PORT)
eval_model = PPO.load(best_model_path, env=eval_env, device='cpu')
obs = eval_env.reset()
total_r, total_s, done = 0.0, 0, False
while not done and total_s < EVAL_MAX_STEPS:
action, _ = eval_model.predict(obs, deterministic=True)
result = eval_env.step(action)
if len(result) == 4:
obs, r, d, info = result
done = bool(d[0])
else:
obs, r, t, tr, info = result
done = bool(t[0] or tr[0])
total_r += float(r[0])
total_s += 1
status = '' if total_s >= EVAL_MAX_STEPS else f'❌@{total_s}'
log(f' Set {s}: {total_r:.1f}r / {total_s}s {status}')
steps_list.append(total_s)
eval_env.close()
time.sleep(3)
except Exception as e:
log(f' Set {s}: ERROR — {e}')
steps_list.append(0)
time.sleep(3)
mean_steps = np.mean(steps_list) if steps_list else 0
results_by_track[track_name] = steps_list
log(f' Mean: {mean_steps:.0f} steps')
log('\n' + '=' * 60)
log('SUMMARY')
log('=' * 60)
for track_name, steps_list in results_by_track.items():
steps_str = '/'.join(str(s) for s in steps_list)
mean = np.mean(steps_list)
verdict = '' if mean >= 1500 else '⚠️' if mean >= 500 else ''
log(f' {verdict} {track_name:20s}: {steps_str} mean={mean:.0f}')
log('\n=== Exp 22 COMPLETE ===')

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,590 @@
[14:34:26] ============================================================
[14:34:26] Exp 20: Parallel DummyVecEnv — 450k steps (sim rebuild + progress fix)
[14:34:26] Sim 1: localhost:9091 → generated_track
[14:34:26] Sim 2: localhost:9093 → mountain_track
[14:34:26] throttle_min=0.2, lr=0.000725, total=450,000
[14:34:26] Reward: v6 + exploit fix (window=200, min_lap=12.0s)
[14:34:26] Stuck termination: 40 steps (~2s), hard cap 30s
[14:34:26] Progress patience: 150 steps (~7.5s at 20fps)
[14:34:26] Checkpoints: every 20,000 steps
[14:34:26] ============================================================
[14:34:26] Creating DummyVecEnv with two tracks...
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
[14:34:26] VecEnv num_envs=2, obs=(3, 120, 160)
Using cpu device
[14:34:31] PPO created. Starting training...
-----------------------------
| time/ | |
| fps | 24 |
| iterations | 1 |
| time_elapsed | 165 |
| total_timesteps | 4096 |
-----------------------------
----------------------------------------
| time/ | |
| fps | 18 |
| iterations | 2 |
| time_elapsed | 444 |
| total_timesteps | 8192 |
| train/ | |
| approx_kl | 0.14028513 |
| clip_fraction | 0.291 |
| clip_range | 0.2 |
| entropy_loss | -2.81 |
| explained_variance | -0.18 |
| learning_rate | 0.000725 |
| loss | -0.107 |
| n_updates | 10 |
| policy_gradient_loss | -0.0541 |
| std | 0.953 |
| value_loss | 0.438 |
----------------------------------------
---------------------------------------
| time/ | |
| fps | 18 |
| iterations | 3 |
| time_elapsed | 674 |
| total_timesteps | 12288 |
| train/ | |
| approx_kl | 0.1430203 |
| clip_fraction | 0.453 |
| clip_range | 0.2 |
| entropy_loss | -2.73 |
| explained_variance | 0.0647 |
| learning_rate | 0.000725 |
| loss | -0.0709 |
| n_updates | 20 |
| policy_gradient_loss | -0.0486 |
| std | 0.926 |
| value_loss | 1.94 |
---------------------------------------
----------------------------------------
| time/ | |
| fps | 18 |
| iterations | 4 |
| time_elapsed | 868 |
| total_timesteps | 16384 |
| train/ | |
| approx_kl | 0.32767397 |
| clip_fraction | 0.571 |
| clip_range | 0.2 |
| entropy_loss | -2.62 |
| explained_variance | 0.34 |
| learning_rate | 0.000725 |
| loss | -0.129 |
| n_updates | 30 |
| policy_gradient_loss | -0.0851 |
| std | 0.856 |
| value_loss | 0.175 |
----------------------------------------
----------------------------------------
| time/ | |
| fps | 19 |
| iterations | 5 |
| time_elapsed | 1053 |
| total_timesteps | 20480 |
| train/ | |
| approx_kl | 0.32903564 |
| clip_fraction | 0.611 |
| clip_range | 0.2 |
| entropy_loss | -2.46 |
| explained_variance | 0.534 |
| learning_rate | 0.000725 |
| loss | -0.0762 |
| n_updates | 40 |
| policy_gradient_loss | -0.0877 |
| std | 0.788 |
| value_loss | 0.206 |
----------------------------------------
[14:53:33] [20,000/450,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5/checkpoint_0020000.zip
[14:53:39] Eval: gen_track=7.3r/88s ❌@88 mountain=5.5r/88s ❌@88
[14:53:39] NEW BEST: 12.7 combined reward
------------------------------
| time/ | |
| fps | 44 |
| iterations | 1 |
| time_elapsed | 92 |
| total_timesteps | 24576 |
------------------------------
---------------------------------------
| time/ | |
| fps | 30 |
| iterations | 2 |
| time_elapsed | 271 |
| total_timesteps | 28672 |
| train/ | |
| approx_kl | 0.5804715 |
| clip_fraction | 0.666 |
| clip_range | 0.2 |
| entropy_loss | -2.09 |
| explained_variance | 0.766 |
| learning_rate | 0.000725 |
| loss | -0.109 |
| n_updates | 60 |
| policy_gradient_loss | -0.0851 |
| std | 0.648 |
| value_loss | 0.15 |
---------------------------------------
--------------------------------------
| time/ | |
| fps | 27 |
| iterations | 3 |
| time_elapsed | 448 |
| total_timesteps | 32768 |
| train/ | |
| approx_kl | 0.629732 |
| clip_fraction | 0.693 |
| clip_range | 0.2 |
| entropy_loss | -1.88 |
| explained_variance | 0.759 |
| learning_rate | 0.000725 |
| loss | -0.089 |
| n_updates | 70 |
| policy_gradient_loss | -0.0853 |
| std | 0.587 |
| value_loss | 0.165 |
--------------------------------------
----------------------------------------
| time/ | |
| fps | 26 |
| iterations | 4 |
| time_elapsed | 613 |
| total_timesteps | 36864 |
| train/ | |
| approx_kl | 0.70558834 |
| clip_fraction | 0.699 |
| clip_range | 0.2 |
| entropy_loss | -1.68 |
| explained_variance | 0.551 |
| learning_rate | 0.000725 |
| loss | -0.112 |
| n_updates | 80 |
| policy_gradient_loss | -0.0853 |
| std | 0.529 |
| value_loss | 0.268 |
----------------------------------------
----------------------------------------
| time/ | |
| fps | 26 |
| iterations | 5 |
| time_elapsed | 776 |
| total_timesteps | 40960 |
| train/ | |
| approx_kl | 0.67741144 |
| clip_fraction | 0.706 |
| clip_range | 0.2 |
| entropy_loss | -1.48 |
| explained_variance | 0.593 |
| learning_rate | 0.000725 |
| loss | -0.106 |
| n_updates | 90 |
| policy_gradient_loss | -0.0837 |
| std | 0.48 |
| value_loss | 0.285 |
----------------------------------------
[15:08:02] [40,000/450,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5/checkpoint_0040000.zip
[15:08:09] Eval: gen_track=19.2r/144s ❌@144 mountain=11.6r/144s ❌@144
[15:08:09] NEW BEST: 30.8 combined reward
------------------------------
| time/ | |
| fps | 60 |
| iterations | 1 |
| time_elapsed | 68 |
| total_timesteps | 45056 |
------------------------------
----------------------------------------
| time/ | |
| fps | 37 |
| iterations | 2 |
| time_elapsed | 221 |
| total_timesteps | 49152 |
| train/ | |
| approx_kl | 0.84428275 |
| clip_fraction | 0.711 |
| clip_range | 0.2 |
| entropy_loss | -1.09 |
| explained_variance | 0.654 |
| learning_rate | 0.000725 |
| loss | -0.0724 |
| n_updates | 110 |
| policy_gradient_loss | -0.0718 |
| std | 0.394 |
| value_loss | 0.386 |
----------------------------------------
----------------------------------------
| time/ | |
| fps | 33 |
| iterations | 3 |
| time_elapsed | 367 |
| total_timesteps | 53248 |
| train/ | |
| approx_kl | 0.86503875 |
| clip_fraction | 0.735 |
| clip_range | 0.2 |
| entropy_loss | -0.886 |
| explained_variance | 0.775 |
| learning_rate | 0.000725 |
| loss | -0.0749 |
| n_updates | 120 |
| policy_gradient_loss | -0.0763 |
| std | 0.355 |
| value_loss | 0.236 |
----------------------------------------
---------------------------------------
| time/ | |
| fps | 31 |
| iterations | 4 |
| time_elapsed | 516 |
| total_timesteps | 57344 |
| train/ | |
| approx_kl | 1.0894502 |
| clip_fraction | 0.72 |
| clip_range | 0.2 |
| entropy_loss | -0.678 |
| explained_variance | 0.779 |
| learning_rate | 0.000725 |
| loss | -0.0494 |
| n_updates | 130 |
| policy_gradient_loss | -0.0692 |
| std | 0.318 |
| value_loss | 0.324 |
---------------------------------------
---------------------------------------
| time/ | |
| fps | 30 |
| iterations | 5 |
| time_elapsed | 667 |
| total_timesteps | 61440 |
| train/ | |
| approx_kl | 0.9834869 |
| clip_fraction | 0.737 |
| clip_range | 0.2 |
| entropy_loss | -0.454 |
| explained_variance | 0.812 |
| learning_rate | 0.000725 |
| loss | -0.105 |
| n_updates | 140 |
| policy_gradient_loss | -0.0659 |
| std | 0.283 |
| value_loss | 0.263 |
---------------------------------------
[15:20:45] [60,000/450,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5/checkpoint_0060000.zip
[15:20:52] Eval: gen_track=16.7r/135s ❌@135 mountain=11.5r/134s ❌@134
------------------------------
| time/ | |
| fps | 69 |
| iterations | 1 |
| time_elapsed | 58 |
| total_timesteps | 65536 |
------------------------------
---------------------------------------
| time/ | |
| fps | 40 |
| iterations | 2 |
| time_elapsed | 204 |
| total_timesteps | 69632 |
| train/ | |
| approx_kl | 1.0296706 |
| clip_fraction | 0.742 |
| clip_range | 0.2 |
| entropy_loss | 0.00541 |
| explained_variance | 0.847 |
| learning_rate | 0.000725 |
| loss | -0.0589 |
| n_updates | 160 |
| policy_gradient_loss | -0.0642 |
| std | 0.225 |
| value_loss | 0.252 |
---------------------------------------
----------------------------------------
| time/ | |
| fps | 35 |
| iterations | 3 |
| time_elapsed | 345 |
| total_timesteps | 73728 |
| train/ | |
| approx_kl | 0.91380507 |
| clip_fraction | 0.735 |
| clip_range | 0.2 |
| entropy_loss | 0.247 |
| explained_variance | 0.88 |
| learning_rate | 0.000725 |
| loss | -0.0869 |
| n_updates | 170 |
| policy_gradient_loss | -0.0728 |
| std | 0.2 |
| value_loss | 0.233 |
----------------------------------------
---------------------------------------
| time/ | |
| fps | 33 |
| iterations | 4 |
| time_elapsed | 492 |
| total_timesteps | 77824 |
| train/ | |
| approx_kl | 1.1527034 |
| clip_fraction | 0.751 |
| clip_range | 0.2 |
| entropy_loss | 0.476 |
| explained_variance | 0.881 |
| learning_rate | 0.000725 |
| loss | -0.11 |
| n_updates | 180 |
| policy_gradient_loss | -0.0653 |
| std | 0.178 |
| value_loss | 0.204 |
---------------------------------------
---------------------------------------
| time/ | |
| fps | 32 |
| iterations | 5 |
| time_elapsed | 633 |
| total_timesteps | 81920 |
| train/ | |
| approx_kl | 1.6661448 |
| clip_fraction | 0.777 |
| clip_range | 0.2 |
| entropy_loss | 0.708 |
| explained_variance | 0.949 |
| learning_rate | 0.000725 |
| loss | -0.121 |
| n_updates | 190 |
| policy_gradient_loss | -0.0697 |
| std | 0.159 |
| value_loss | 0.101 |
---------------------------------------
[15:33:03] [80,000/450,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5/checkpoint_0080000.zip
[15:33:10] Eval: gen_track=22.8r/169s ❌@169 mountain=13.6r/168s ❌@168
[15:33:10] NEW BEST: 36.4 combined reward
------------------------------
| time/ | |
| fps | 84 |
| iterations | 1 |
| time_elapsed | 48 |
| total_timesteps | 86016 |
------------------------------
---------------------------------------
| time/ | |
| fps | 42 |
| iterations | 2 |
| time_elapsed | 192 |
| total_timesteps | 90112 |
| train/ | |
| approx_kl | 1.1363616 |
| clip_fraction | 0.765 |
| clip_range | 0.2 |
| entropy_loss | 1.13 |
| explained_variance | 0.741 |
| learning_rate | 0.000725 |
| loss | -0.0656 |
| n_updates | 210 |
| policy_gradient_loss | -0.0623 |
| std | 0.129 |
| value_loss | 0.325 |
---------------------------------------
---------------------------------------
| time/ | |
| fps | 36 |
| iterations | 3 |
| time_elapsed | 335 |
| total_timesteps | 94208 |
| train/ | |
| approx_kl | 1.3523921 |
| clip_fraction | 0.757 |
| clip_range | 0.2 |
| entropy_loss | 1.32 |
| explained_variance | 0.772 |
| learning_rate | 0.000725 |
| loss | -0.0286 |
| n_updates | 220 |
| policy_gradient_loss | -0.0511 |
| std | 0.116 |
| value_loss | 0.485 |
---------------------------------------
---------------------------------------
| time/ | |
| fps | 34 |
| iterations | 4 |
| time_elapsed | 480 |
| total_timesteps | 98304 |
| train/ | |
| approx_kl | 1.1116364 |
| clip_fraction | 0.751 |
| clip_range | 0.2 |
| entropy_loss | 1.51 |
| explained_variance | 0.768 |
| learning_rate | 0.000725 |
| loss | -0.0579 |
| n_updates | 230 |
| policy_gradient_loss | -0.0407 |
| std | 0.106 |
| value_loss | 0.418 |
---------------------------------------
--------------------------------------
| time/ | |
| fps | 32 |
| iterations | 5 |
| time_elapsed | 624 |
| total_timesteps | 102400 |
| train/ | |
| approx_kl | 1.033067 |
| clip_fraction | 0.748 |
| clip_range | 0.2 |
| entropy_loss | 1.71 |
| explained_variance | 0.77 |
| learning_rate | 0.000725 |
| loss | -0.0622 |
| n_updates | 240 |
| policy_gradient_loss | -0.0379 |
| std | 0.0963 |
| value_loss | 0.517 |
--------------------------------------
[15:45:37] [100,000/450,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5/checkpoint_0100000.zip
[15:45:45] Eval: gen_track=19.1r/157s ❌@157 mountain=13.6r/157s ❌@157
-------------------------------
| time/ | |
| fps | 71 |
| iterations | 1 |
| time_elapsed | 57 |
| total_timesteps | 106496 |
-------------------------------
---------------------------------------
| time/ | |
| fps | 33 |
| iterations | 2 |
| time_elapsed | 243 |
| total_timesteps | 110592 |
| train/ | |
| approx_kl | 1.3683245 |
| clip_fraction | 0.757 |
| clip_range | 0.2 |
| entropy_loss | 2.11 |
| explained_variance | 0.805 |
| learning_rate | 0.000725 |
| loss | -0.0944 |
| n_updates | 260 |
| policy_gradient_loss | -0.0381 |
| std | 0.0785 |
| value_loss | 0.404 |
---------------------------------------
---------------------------------------
| time/ | |
| fps | 26 |
| iterations | 3 |
| time_elapsed | 459 |
| total_timesteps | 114688 |
| train/ | |
| approx_kl | 1.6867702 |
| clip_fraction | 0.786 |
| clip_range | 0.2 |
| entropy_loss | 2.24 |
| explained_variance | 0.739 |
| learning_rate | 0.000725 |
| loss | 0.0131 |
| n_updates | 270 |
| policy_gradient_loss | 0.00625 |
| std | 0.0737 |
| value_loss | 0.725 |
---------------------------------------
---------------------------------------
| time/ | |
| fps | 24 |
| iterations | 4 |
| time_elapsed | 677 |
| total_timesteps | 118784 |
| train/ | |
| approx_kl | 6.1363573 |
| clip_fraction | 0.82 |
| clip_range | 0.2 |
| entropy_loss | 2.43 |
| explained_variance | 0.664 |
| learning_rate | 0.000725 |
| loss | 0.0355 |
| n_updates | 280 |
| policy_gradient_loss | -0.00149 |
| std | 0.0674 |
| value_loss | 0.697 |
---------------------------------------
---------------------------------------
| time/ | |
| fps | 22 |
| iterations | 5 |
| time_elapsed | 910 |
| total_timesteps | 122880 |
| train/ | |
| approx_kl | 4.7547264 |
| clip_fraction | 0.809 |
| clip_range | 0.2 |
| entropy_loss | 2.59 |
| explained_variance | 0.663 |
| learning_rate | 0.000725 |
| loss | 0.0146 |
| n_updates | 290 |
| policy_gradient_loss | 0.00373 |
| std | 0.0619 |
| value_loss | 0.76 |
---------------------------------------
[16:03:21] [120,000/450,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5/checkpoint_0120000.zip
[16:03:27] Eval: gen_track=5.7r/63s ❌@63 mountain=4.8r/97s ❌@97
-------------------------------
| time/ | |
| fps | 40 |
| iterations | 1 |
| time_elapsed | 101 |
| total_timesteps | 126976 |
-------------------------------
--------------------------------------
| time/ | |
| fps | 22 |
| iterations | 2 |
| time_elapsed | 356 |
| total_timesteps | 131072 |
| train/ | |
| approx_kl | 8.778878 |
| clip_fraction | 0.796 |
| clip_range | 0.2 |
| entropy_loss | 2.96 |
| explained_variance | 0.732 |
| learning_rate | 0.000725 |
| loss | 0.00687 |
| n_updates | 310 |
| policy_gradient_loss | -0.00436 |
| std | 0.0509 |
| value_loss | 0.332 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 20 |
| iterations | 3 |
| time_elapsed | 600 |
| total_timesteps | 135168 |
| train/ | |
| approx_kl | 3.3255148 |
| clip_fraction | 0.793 |
| clip_range | 0.2 |
| entropy_loss | 3.16 |
| explained_variance | 0.796 |
| learning_rate | 0.000725 |
| loss | -0.0742 |
| n_updates | 320 |
| policy_gradient_loss | -0.000784 |
| std | 0.0466 |
| value_loss | 0.237 |
---------------------------------------

View File

@ -0,0 +1,61 @@
[20:41:36] ============================================================
[20:41:36] Exp 21: generated_road + generated_track, warm-started, v4 reward
[20:41:36] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
[20:41:36] Sim 1: localhost:9091 -> generated_road
[20:41:36] Sim 2: localhost:9093 -> generated_track
[20:41:36] throttle_min=0.2, lr=0.000225, total=150,000
[20:41:36] Checkpoints: every 10,000 steps
[20:41:36] ============================================================
[20:41:36] Creating DummyVecEnv with the two road tracks...
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
[20:41:36] VecEnv num_envs=2, obs=(3, 120, 160)
[20:41:40] Warm-start model attached. Starting training...
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 28 |
| iterations | 1 |
| time_elapsed | 146 |
| total_timesteps | 18432 |
---------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 19 |
| iterations | 2 |
| time_elapsed | 421 |
| total_timesteps | 22528 |
| train/ | |
| approx_kl | 0.015421186 |
| clip_fraction | 0.206 |
| clip_range | 0.2 |
| entropy_loss | -2.79 |
| explained_variance | -0.236 |
| learning_rate | 0.000225 |
| loss | 23.8 |
| n_updates | 80 |
| policy_gradient_loss | 0.00689 |
| std | 0.98 |
| value_loss | 67.9 |
-----------------------------------------

View File

@ -0,0 +1,63 @@
[20:54:28] ============================================================
[20:54:28] Exp 21: generated_road + generated_track, warm-started, v4 reward
[20:54:28] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
[20:54:28] Sim 1: localhost:9091 -> generated_road
[20:54:28] Sim 2: localhost:9093 -> generated_track
[20:54:28] throttle_min=0.2, lr=0.000225, total=150,000
[20:54:28] Checkpoints: every 10,000 steps
[20:54:28] ============================================================
[20:54:28] Creating DummyVecEnv with the two road tracks...
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
loading scene generated_road
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
loading scene generated_track
[20:54:30] VecEnv num_envs=2, obs=(3, 120, 160)
[20:54:35] Warm-start model attached. Starting training...
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 25 |
| iterations | 1 |
| time_elapsed | 162 |
| total_timesteps | 18432 |
---------------------------------
----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 17 |
| iterations | 2 |
| time_elapsed | 461 |
| total_timesteps | 22528 |
| train/ | |
| approx_kl | 0.02005615 |
| clip_fraction | 0.244 |
| clip_range | 0.2 |
| entropy_loss | -2.79 |
| explained_variance | -1.26 |
| learning_rate | 0.000225 |
| loss | 21.8 |
| n_updates | 80 |
| policy_gradient_loss | 0.0144 |
| std | 0.979 |
| value_loss | 54.3 |
----------------------------------------

View File

@ -0,0 +1,40 @@
[21:03:33] ============================================================
[21:03:33] Exp 21: generated_road + generated_track, warm-started, v4 reward
[21:03:33] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
[21:03:33] Sim 1: localhost:9091 -> generated_road
[21:03:33] Sim 2: localhost:9093 -> generated_track
[21:03:33] throttle_min=0.2, lr=0.000225, total=150,000
[21:03:33] Termination: StuckTerminationWrapper enabled
[21:03:33] Checkpoints: every 10,000 steps
[21:03:33] ============================================================
[21:03:33] Creating DummyVecEnv with the two road tracks...
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
[21:03:33] VecEnv num_envs=2, obs=(3, 120, 160)
[21:03:37] Warm-start model attached. Starting training...
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 24 |
| iterations | 1 |
| time_elapsed | 167 |
| total_timesteps | 18432 |
---------------------------------

View File

@ -0,0 +1 @@
611625

View File

@ -0,0 +1,49 @@
/home/paulh/.local/lib/python3.10/site-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.
warnings.warn("Unable to import Axes3D. This may be due to multiple versions of "
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
[21:16:53] ============================================================
[21:16:53] Exp 22: generated_road + generated_track, warm-started, v6 reward
[21:16:53] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
[21:16:53] Sim 1: localhost:9091 -> generated_road
[21:16:53] Sim 2: localhost:9093 -> generated_track
[21:16:53] throttle_min=0.2, lr=0.000225, total=150,000
[21:16:53] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
[21:16:53] Stuck timeout: 8.0s, hard cap: 25.0s
[21:16:53] Progress patience: 100 steps
[21:16:53] Checkpoints: every 10,000 steps
[21:16:53] ============================================================
[21:16:53] Creating DummyVecEnv with the two road tracks...
INFO:gym_donkeycar.core.client:connecting to localhost:9091
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:236: UserWarning: WARN: Box low's precision lowered by casting to float32, current low.dtype=float64
gym.logger.warn(
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:306: UserWarning: WARN: Box high's precision lowered by casting to float32, current high.dtype=float64
gym.logger.warn(
INFO:gym_donkeycar.envs.donkey_sim:on need car config
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
INFO:gym_donkeycar.envs.donkey_sim:sim started!
INFO:gym_donkeycar.core.client:connecting to localhost:9093
INFO:gym_donkeycar.envs.donkey_sim:on need car config
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
INFO:gym_donkeycar.envs.donkey_sim:sim started!
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
[21:16:53] VecEnv num_envs=2, obs=(3, 120, 160)

View File

@ -0,0 +1,383 @@
[21:23:45] ============================================================
[21:23:45] Exp 22: generated_road + generated_track, warm-started, v6 reward
[21:23:45] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
[21:23:45] Sim 1: localhost:9091 -> generated_road
[21:23:45] Sim 2: localhost:9093 -> generated_track
[21:23:45] throttle_min=0.2, lr=0.000225, total=150,000
[21:23:45] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
[21:23:45] Stuck timeout: 8.0s, hard cap: 25.0s
[21:23:45] Progress patience: 100 steps
[21:23:45] Checkpoints: every 10,000 steps
[21:23:45] ============================================================
[21:23:45] Creating DummyVecEnv with the two road tracks...
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
loading scene generated_road
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
loading scene generated_track
[21:23:47] VecEnv num_envs=2, obs=(3, 120, 160)
[21:23:50] Warm-start model attached. Starting training...
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 29 |
| iterations | 1 |
| time_elapsed | 139 |
| total_timesteps | 18432 |
---------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 21 |
| iterations | 2 |
| time_elapsed | 378 |
| total_timesteps | 22528 |
| train/ | |
| approx_kl | 0.024176385 |
| clip_fraction | 0.244 |
| clip_range | 0.2 |
| entropy_loss | -2.79 |
| explained_variance | -1.36 |
| learning_rate | 0.000225 |
| loss | 12.5 |
| n_updates | 80 |
| policy_gradient_loss | 0.0113 |
| std | 0.976 |
| value_loss | 41.2 |
-----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 19 |
| iterations | 3 |
| time_elapsed | 616 |
| total_timesteps | 26624 |
| train/ | |
| approx_kl | 0.021042215 |
| clip_fraction | 0.227 |
| clip_range | 0.2 |
| entropy_loss | -2.77 |
| explained_variance | 0.519 |
| learning_rate | 0.000225 |
| loss | 2.82 |
| n_updates | 90 |
| policy_gradient_loss | 0.00236 |
| std | 0.959 |
| value_loss | 9.14 |
-----------------------------------------
[21:35:50] [10,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0010000.zip
[21:35:56] Eval: gen_road=3.0r/64s ❌@64 gen_track=1.1r/63s ❌@63
[21:35:56] NEW BEST: combined steps=127 reward=4.1
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 31 |
| iterations | 1 |
| time_elapsed | 129 |
| total_timesteps | 30720 |
---------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 22 |
| iterations | 2 |
| time_elapsed | 357 |
| total_timesteps | 34816 |
| train/ | |
| approx_kl | 0.027895104 |
| clip_fraction | 0.222 |
| clip_range | 0.2 |
| entropy_loss | -2.67 |
| explained_variance | 0.27 |
| learning_rate | 0.000225 |
| loss | 0.0657 |
| n_updates | 110 |
| policy_gradient_loss | -0.0236 |
| std | 0.907 |
| value_loss | 0.549 |
-----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 20 |
| iterations | 3 |
| time_elapsed | 587 |
| total_timesteps | 38912 |
| train/ | |
| approx_kl | 0.038819656 |
| clip_fraction | 0.24 |
| clip_range | 0.2 |
| entropy_loss | -2.63 |
| explained_variance | 0.346 |
| learning_rate | 0.000225 |
| loss | -0.0014 |
| n_updates | 120 |
| policy_gradient_loss | -0.0293 |
| std | 0.893 |
| value_loss | 0.157 |
-----------------------------------------
[21:47:36] [20,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0020000.zip
[21:47:42] Eval: gen_road=2.9r/64s ❌@64 gen_track=1.1r/63s ❌@63
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 33 |
| iterations | 1 |
| time_elapsed | 122 |
| total_timesteps | 43008 |
---------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 23 |
| iterations | 2 |
| time_elapsed | 351 |
| total_timesteps | 47104 |
| train/ | |
| approx_kl | 0.060704876 |
| clip_fraction | 0.327 |
| clip_range | 0.2 |
| entropy_loss | -2.53 |
| explained_variance | 0.877 |
| learning_rate | 0.000225 |
| loss | -0.0427 |
| n_updates | 140 |
| policy_gradient_loss | -0.045 |
| std | 0.847 |
| value_loss | 0.0676 |
-----------------------------------------
----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 21 |
| iterations | 3 |
| time_elapsed | 571 |
| total_timesteps | 51200 |
| train/ | |
| approx_kl | 0.06585144 |
| clip_fraction | 0.35 |
| clip_range | 0.2 |
| entropy_loss | -2.49 |
| explained_variance | 0.883 |
| learning_rate | 0.000225 |
| loss | -0.0429 |
| n_updates | 150 |
| policy_gradient_loss | -0.0419 |
| std | 0.833 |
| value_loss | 0.0814 |
----------------------------------------
[21:58:56] [30,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0030000.zip
[21:59:02] Eval: gen_road=2.7r/63s ❌@63 gen_track=1.1r/62s ❌@62
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 33 |
| iterations | 1 |
| time_elapsed | 121 |
| total_timesteps | 55296 |
---------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 23 |
| iterations | 2 |
| time_elapsed | 343 |
| total_timesteps | 59392 |
| train/ | |
| approx_kl | 0.096836925 |
| clip_fraction | 0.422 |
| clip_range | 0.2 |
| entropy_loss | -2.42 |
| explained_variance | 0.85 |
| learning_rate | 0.000225 |
| loss | -0.0767 |
| n_updates | 170 |
| policy_gradient_loss | -0.053 |
| std | 0.8 |
| value_loss | 0.0973 |
-----------------------------------------
----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 22 |
| iterations | 3 |
| time_elapsed | 556 |
| total_timesteps | 63488 |
| train/ | |
| approx_kl | 0.16407205 |
| clip_fraction | 0.461 |
| clip_range | 0.2 |
| entropy_loss | -2.35 |
| explained_variance | 0.9 |
| learning_rate | 0.000225 |
| loss | -0.0875 |
| n_updates | 180 |
| policy_gradient_loss | -0.0633 |
| std | 0.758 |
| value_loss | 0.035 |
----------------------------------------
[22:10:03] [40,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0040000.zip
[22:10:09] Eval: gen_road=3.1r/59s ❌@59 gen_track=1.3r/58s ❌@58
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 36 |
| iterations | 1 |
| time_elapsed | 113 |
| total_timesteps | 67584 |
---------------------------------
----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 24 |
| iterations | 2 |
| time_elapsed | 329 |
| total_timesteps | 71680 |
| train/ | |
| approx_kl | 0.17689857 |
| clip_fraction | 0.489 |
| clip_range | 0.2 |
| entropy_loss | -2.18 |
| explained_variance | 0.917 |
| learning_rate | 0.000225 |
| loss | -0.0885 |
| n_updates | 200 |
| policy_gradient_loss | -0.0635 |
| std | 0.698 |
| value_loss | 0.054 |
----------------------------------------
---------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 22 |
| iterations | 3 |
| time_elapsed | 548 |
| total_timesteps | 75776 |
| train/ | |
| approx_kl | 0.1996874 |
| clip_fraction | 0.506 |
| clip_range | 0.2 |
| entropy_loss | -2.08 |
| explained_variance | 0.933 |
| learning_rate | 0.000225 |
| loss | -0.0906 |
| n_updates | 210 |
| policy_gradient_loss | -0.0629 |
| std | 0.666 |
| value_loss | 0.043 |
---------------------------------------
[22:20:58] [50,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0050000.zip
[22:21:04] Eval: gen_road=5.9r/67s ❌@67 gen_track=1.6r/66s ❌@66
[22:21:04] NEW BEST: combined steps=133 reward=7.6
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 35 |
| iterations | 1 |
| time_elapsed | 115 |
| total_timesteps | 79872 |
---------------------------------
--------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 25 |
| iterations | 2 |
| time_elapsed | 326 |
| total_timesteps | 83968 |
| train/ | |
| approx_kl | 0.254287 |
| clip_fraction | 0.543 |
| clip_range | 0.2 |
| entropy_loss | -1.94 |
| explained_variance | 0.89 |
| learning_rate | 0.000225 |
| loss | -0.102 |
| n_updates | 230 |
| policy_gradient_loss | -0.0707 |
| std | 0.62 |
| value_loss | 0.0646 |
--------------------------------------
----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 22 |
| iterations | 3 |
| time_elapsed | 551 |
| total_timesteps | 88064 |
| train/ | |
| approx_kl | 0.32521772 |
| clip_fraction | 0.604 |
| clip_range | 0.2 |
| entropy_loss | -1.85 |
| explained_variance | 0.803 |
| learning_rate | 0.000225 |
| loss | -0.0781 |
| n_updates | 240 |
| policy_gradient_loss | -0.0776 |
| std | 0.594 |
| value_loss | 0.102 |
----------------------------------------
[22:32:03] [60,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0060000.zip
[22:32:09] Eval: gen_road=7.7r/93s ❌@93 gen_track=3.7r/92s ❌@92
[22:32:09] NEW BEST: combined steps=185 reward=11.4
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 43 |
| iterations | 1 |
| time_elapsed | 94 |
| total_timesteps | 92160 |
---------------------------------

View File

@ -0,0 +1,64 @@
/home/paulh/.local/lib/python3.10/site-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.
warnings.warn("Unable to import Axes3D. This may be due to multiple versions of "
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
[14:13:45] ============================================================
[14:13:45] Exp 22: generated_road + generated_track, warm-started, v6 reward
[14:13:45] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
[14:13:45] Sim 1: localhost:9091 -> generated_road
[14:13:45] Sim 2: localhost:9093 -> generated_track
[14:13:45] throttle_min=0.2, lr=0.000225, total=150,000
[14:13:45] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
[14:13:45] Stuck timeout: 8.0s, hard cap: 25.0s
[14:13:45] Progress patience: 100 steps
[14:13:45] Checkpoints: every 10,000 steps
[14:13:45] ============================================================
[14:13:45] Creating DummyVecEnv with the two road tracks...
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
INFO:gym_donkeycar.core.client:connecting to localhost:9091
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:236: UserWarning: WARN: Box low's precision lowered by casting to float32, current low.dtype=float64
gym.logger.warn(
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:306: UserWarning: WARN: Box high's precision lowered by casting to float32, current high.dtype=float64
gym.logger.warn(
INFO:gym_donkeycar.envs.donkey_sim:on need car config
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
INFO:gym_donkeycar.envs.donkey_sim:sim started!
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
INFO:gym_donkeycar.core.client:connecting to localhost:9093
INFO:gym_donkeycar.envs.donkey_sim:on need car config
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
INFO:gym_donkeycar.envs.donkey_sim:sim started!
[14:13:45] VecEnv num_envs=2, obs=(3, 120, 160)
/home/paulh/.local/lib/python3.10/site-packages/stable_baselines3/common/utils.py:166: UserWarning: get_schedule_fn() is deprecated, please use FloatSchedule() instead
warnings.warn("get_schedule_fn() is deprecated, please use FloatSchedule() instead")
/home/paulh/.local/lib/python3.10/site-packages/stable_baselines3/common/utils.py:212: UserWarning: constant_fn() is deprecated, please use ConstantSchedule() instead
warnings.warn("constant_fn() is deprecated, please use ConstantSchedule() instead")
[14:13:49] Warm-start model attached. Starting training...
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 29 |
| iterations | 1 |
| time_elapsed | 139 |
| total_timesteps | 18432 |
---------------------------------

View File

@ -0,0 +1,64 @@
/home/paulh/.local/lib/python3.10/site-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.
warnings.warn("Unable to import Axes3D. This may be due to multiple versions of "
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
[14:19:32] ============================================================
[14:19:32] Exp 22: generated_road + generated_track, warm-started, v6 reward
[14:19:32] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
[14:19:32] Sim 1: localhost:9091 -> generated_road
[14:19:32] Sim 2: localhost:9093 -> generated_track
[14:19:32] throttle_min=0.2, lr=0.000225, total=150,000
[14:19:32] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
[14:19:32] Stuck timeout: 8.0s, hard cap: 25.0s
[14:19:32] Progress patience: 100 steps
[14:19:32] Checkpoints: every 10,000 steps
[14:19:32] ============================================================
[14:19:32] Creating DummyVecEnv with the two road tracks...
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
INFO:gym_donkeycar.core.client:connecting to localhost:9091
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:236: UserWarning: WARN: Box low's precision lowered by casting to float32, current low.dtype=float64
gym.logger.warn(
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:306: UserWarning: WARN: Box high's precision lowered by casting to float32, current high.dtype=float64
gym.logger.warn(
INFO:gym_donkeycar.envs.donkey_sim:on need car config
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
INFO:gym_donkeycar.envs.donkey_sim:sim started!
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
INFO:gym_donkeycar.core.client:connecting to localhost:9093
INFO:gym_donkeycar.envs.donkey_sim:on need car config
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
INFO:gym_donkeycar.envs.donkey_sim:sim started!
[14:19:32] VecEnv num_envs=2, obs=(3, 120, 160)
/home/paulh/.local/lib/python3.10/site-packages/stable_baselines3/common/utils.py:166: UserWarning: get_schedule_fn() is deprecated, please use FloatSchedule() instead
warnings.warn("get_schedule_fn() is deprecated, please use FloatSchedule() instead")
/home/paulh/.local/lib/python3.10/site-packages/stable_baselines3/common/utils.py:212: UserWarning: constant_fn() is deprecated, please use ConstantSchedule() instead
warnings.warn("constant_fn() is deprecated, please use ConstantSchedule() instead")
[14:19:35] Warm-start model attached. Starting training...
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 22 |
| iterations | 1 |
| time_elapsed | 181 |
| total_timesteps | 18432 |
---------------------------------

View File

@ -0,0 +1,121 @@
/home/paulh/.local/lib/python3.10/site-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.
warnings.warn("Unable to import Axes3D. This may be due to multiple versions of "
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
[14:26:23] ============================================================
[14:26:23] Exp 22: generated_road + generated_track, warm-started, v6 reward
[14:26:23] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
[14:26:23] Sim 1: localhost:9091 -> generated_road
[14:26:23] Sim 2: localhost:9093 -> generated_track
[14:26:23] throttle_min=0.2, lr=0.000225, total=150,000
[14:26:23] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
[14:26:23] Stuck timeout: 3.0s, hard cap: 18.0s
[14:26:23] Progress patience: 100 steps
[14:26:23] Checkpoints: every 10,000 steps
[14:26:23] ============================================================
[14:26:23] Creating DummyVecEnv with the two road tracks...
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
INFO:gym_donkeycar.core.client:connecting to localhost:9091
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:236: UserWarning: WARN: Box low's precision lowered by casting to float32, current low.dtype=float64
gym.logger.warn(
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:306: UserWarning: WARN: Box high's precision lowered by casting to float32, current high.dtype=float64
gym.logger.warn(
INFO:gym_donkeycar.envs.donkey_sim:on need car config
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
INFO:gym_donkeycar.envs.donkey_sim:sim started!
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
INFO:gym_donkeycar.core.client:connecting to localhost:9093
INFO:gym_donkeycar.envs.donkey_sim:on need car config
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
INFO:gym_donkeycar.envs.donkey_sim:sim started!
[14:26:23] VecEnv num_envs=2, obs=(3, 120, 160)
/home/paulh/.local/lib/python3.10/site-packages/stable_baselines3/common/utils.py:166: UserWarning: get_schedule_fn() is deprecated, please use FloatSchedule() instead
warnings.warn("get_schedule_fn() is deprecated, please use FloatSchedule() instead")
/home/paulh/.local/lib/python3.10/site-packages/stable_baselines3/common/utils.py:212: UserWarning: constant_fn() is deprecated, please use ConstantSchedule() instead
warnings.warn("constant_fn() is deprecated, please use ConstantSchedule() instead")
[14:26:26] Warm-start model attached. Starting training...
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 23 |
| iterations | 1 |
| time_elapsed | 177 |
| total_timesteps | 18432 |
---------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 18 |
| iterations | 2 |
| time_elapsed | 446 |
| total_timesteps | 22528 |
| train/ | |
| approx_kl | 0.012866169 |
| clip_fraction | 0.26 |
| clip_range | 0.2 |
| entropy_loss | -2.79 |
| explained_variance | -1.1 |
| learning_rate | 0.000225 |
| loss | 4.57 |
| n_updates | 80 |
| policy_gradient_loss | 0.0151 |
| std | 0.981 |
| value_loss | 25.9 |
-----------------------------------------
------------------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 17 |
| iterations | 3 |
| time_elapsed | 714 |
| total_timesteps | 26624 |
| train/ | |
| approx_kl | 0.0133808125 |
| clip_fraction | 0.199 |
| clip_range | 0.2 |
| entropy_loss | -2.81 |
| explained_variance | 0.199 |
| learning_rate | 0.000225 |
| loss | 0.858 |
| n_updates | 90 |
| policy_gradient_loss | 0.00454 |
| std | 0.985 |
| value_loss | 3.54 |
------------------------------------------
[14:39:55] [10,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0010000.zip
[14:40:01] Eval: gen_road=0.2r/41s ❌@41 gen_track=-0.4r/36s ❌@36
[14:40:01] NEW BEST: combined steps=77 reward=-0.3
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 22 |
| iterations | 1 |
| time_elapsed | 180 |
| total_timesteps | 30720 |
---------------------------------

View File

@ -0,0 +1,42 @@
[10:19:05] ============================================================
[10:19:05] Exp 22: generated_road + generated_track, warm-started, v6 reward
[10:19:05] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
[10:19:05] Sim 1: localhost:9091 -> generated_road
[10:19:05] Sim 2: localhost:9093 -> generated_track
[10:19:05] throttle_min=0.2, lr=0.000225, total=150,000
[10:19:05] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
[10:19:05] Stuck timeout: 8.0s, hard cap: 25.0s
[10:19:05] Progress patience: 100 steps
[10:19:05] Checkpoints: every 10,000 steps
[10:19:05] ============================================================
[10:19:05] Creating DummyVecEnv with the two road tracks...
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
[10:19:06] VecEnv num_envs=2, obs=(3, 120, 160)
[10:19:09] Warm-start model attached. Starting training...
---------------------------------
| rollout/ | |
| ep_len_mean | 118 |
| ep_rew_mean | 102 |
| time/ | |
| fps | 39 |
| iterations | 1 |
| time_elapsed | 103 |
| total_timesteps | 18432 |
---------------------------------

View File

@ -0,0 +1,32 @@
[21:17:40] ============================================================
[21:17:40] Exp 22: generated_road + generated_track, warm-started, v6 reward
[21:17:40] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
[21:17:40] Sim 1: localhost:9091 -> generated_road
[21:17:40] Sim 2: localhost:9093 -> generated_track
[21:17:40] throttle_min=0.2, lr=0.000225, total=150,000
[21:17:40] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
[21:17:40] Stuck timeout: 8.0s, hard cap: 25.0s
[21:17:40] Progress patience: 100 steps
[21:17:40] Checkpoints: every 10,000 steps
[21:17:40] ============================================================
[21:17:40] Creating DummyVecEnv with the two road tracks...
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
starting DonkeyGym env
Setting default: start_delay 5.0
Setting default: max_cte 8.0
Setting default: frame_skip 1
Setting default: cam_resolution (120, 160, 3)
Setting default: log_level 20
Setting default: steer_limit 1.0
Setting default: throttle_min 0.0
Setting default: throttle_max 1.0
[21:17:40] VecEnv num_envs=2, obs=(3, 120, 160)
[21:17:43] Warm-start model attached. Starting training...

View File

@ -98,6 +98,10 @@ class SpeedRewardWrapper(gym.Wrapper):
cte_patience: int = 20, cte_patience: int = 20,
progress_patience: int = 60, progress_patience: int = 60,
efficiency_patience: int = 20, # steps of low efficiency before termination efficiency_patience: int = 20, # steps of low efficiency before termination
low_speed_patience: int = 20,
low_speed_threshold: float = 0.2,
low_speed_min_displacement: float = 0.25,
low_speed_grace_steps: int = 20,
): ):
super().__init__(env) super().__init__(env)
self.speed_scale = speed_scale self.speed_scale = speed_scale
@ -109,12 +113,21 @@ class SpeedRewardWrapper(gym.Wrapper):
self.cte_patience = cte_patience self.cte_patience = cte_patience
self.progress_patience = progress_patience self.progress_patience = progress_patience
self.efficiency_patience = efficiency_patience self.efficiency_patience = efficiency_patience
self.low_speed_patience = low_speed_patience
self.low_speed_threshold = low_speed_threshold
self.low_speed_min_displacement = low_speed_min_displacement
self.low_speed_grace_steps = low_speed_grace_steps
self._pos_history = deque(maxlen=window_size + 1) self._pos_history = deque(maxlen=window_size + 1)
self._last_lap_count = 0 self._last_lap_count = 0
self._high_cte_steps = 0 self._high_cte_steps = 0
self._max_node_seen = -1 self._max_node_seen = -1
self._no_progress_steps = 0 self._no_progress_steps = 0
self._low_eff_steps = 0 self._low_eff_steps = 0
self._solid_hit_steps = 0
self._prev_speed = 0.0
self._episode_steps = 0
self._low_speed_steps = 0
self._low_speed_anchor = None
def reset(self, **kwargs): def reset(self, **kwargs):
result = self.env.reset(**kwargs) result = self.env.reset(**kwargs)
@ -124,6 +137,11 @@ class SpeedRewardWrapper(gym.Wrapper):
self._max_node_seen = -1 self._max_node_seen = -1
self._no_progress_steps = 0 self._no_progress_steps = 0
self._low_eff_steps = 0 self._low_eff_steps = 0
self._solid_hit_steps = 0
self._prev_speed = 0.0
self._episode_steps = 0
self._low_speed_steps = 0
self._low_speed_anchor = None
return result return result
def step(self, action): def step(self, action):
@ -168,14 +186,18 @@ class SpeedRewardWrapper(gym.Wrapper):
reward = -1.0 (on crash/termination) reward = -1.0 (on crash/termination)
""" """
# Track position for efficiency calculation # Track position for efficiency calculation
current_pos = None
try: try:
pos = info.get('pos', (0.0, 0.0, 0.0)) pos = info.get('pos', (0.0, 0.0, 0.0))
pos_x = float(pos[0]) pos_x = float(pos[0])
pos_z = float(pos[2]) pos_z = float(pos[2])
self._pos_history.append(np.array([pos_x, pos_z])) current_pos = np.array([pos_x, pos_z])
self._pos_history.append(current_pos)
except (TypeError, ValueError, IndexError): except (TypeError, ValueError, IndexError):
pass pass
self._episode_steps += 1
# Crash / episode over # Crash / episode over
if done: if done:
return -1.0, False return -1.0, False
@ -186,11 +208,82 @@ class SpeedRewardWrapper(gym.Wrapper):
except (TypeError, ValueError): except (TypeError, ValueError):
cte = 0.0 cte = 0.0
# --- Grass exploit: sustained high CTE termination --- # --- Speed / collision classification ---
try:
speed = max(0.0, float(info.get('speed', 0.0) or 0.0))
except (TypeError, ValueError):
speed = 0.0
try:
hit = str(info.get('hit', 'none') or 'none').lower()
except Exception:
hit = 'none'
solid_hit = (
hit != 'none' and (
'barrier' in hit or
'wall' in hit or
'tree' in hit
)
)
# Allow brief brushes, but terminate on:
# 1. a head-on style stop: car was moving, then collision arrives with
# a large speed drop; or
# 2. sustained obstacle contact over several telemetry frames.
if solid_hit:
head_on_impact = self._prev_speed >= 1.5 and speed <= 0.35
if head_on_impact:
self._prev_speed = speed
return -1.0, True
self._solid_hit_steps += 1
if self._solid_hit_steps >= 4:
self._prev_speed = speed
return -1.0, True
else:
self._solid_hit_steps = 0
# --- Wheels-spinning / barrier wedge termination ---
# CTE can remain deceptively acceptable when the car is pressed against
# a generated-road barrier or invisible collider. If speed stays near
# zero and position does not meaningfully change after the launch grace
# period, kill the episode quickly with a negative reward.
if (
current_pos is not None
and self._episode_steps > self.low_speed_grace_steps
and speed <= self.low_speed_threshold
):
if self._low_speed_anchor is None:
self._low_speed_anchor = current_pos
self._low_speed_steps = 1
else:
moved = float(np.linalg.norm(current_pos - self._low_speed_anchor))
if moved >= self.low_speed_min_displacement:
self._low_speed_anchor = current_pos
self._low_speed_steps = 0
else:
self._low_speed_steps += 1
if self._low_speed_steps >= self.low_speed_patience:
self._prev_speed = speed
return -1.0, True
else:
self._low_speed_steps = 0
self._low_speed_anchor = current_pos
# --- Grass / outside-road exploit: high CTE is bad immediately ---
# Do not let the policy collect positive speed reward while it is
# outside the useful road corridor. Earlier versions only terminated
# after patience frames, but still paid positive reward during those
# frames; PPO learned large fast circles outside generated_road.
if abs(cte) > self.max_cte_terminate: if abs(cte) > self.max_cte_terminate:
self._high_cte_steps += 1 self._high_cte_steps += 1
if self._high_cte_steps >= self.cte_patience: if self._high_cte_steps >= self.cte_patience:
self._prev_speed = speed
return -1.0, True # too long off-track — terminate return -1.0, True # too long off-track — terminate
self._prev_speed = speed
return -0.25, False
else: else:
self._high_cte_steps = 0 self._high_cte_steps = 0
@ -214,6 +307,7 @@ class SpeedRewardWrapper(gym.Wrapper):
else: else:
self._no_progress_steps += 1 self._no_progress_steps += 1
if self._no_progress_steps >= self.progress_patience: if self._no_progress_steps >= self.progress_patience:
self._prev_speed = speed
return -1.0, True # no forward progress — terminate return -1.0, True # no forward progress — terminate
@ -233,6 +327,7 @@ class SpeedRewardWrapper(gym.Wrapper):
lap_time = 999.0 lap_time = 999.0
if lap_time < self.min_lap_time: if lap_time < self.min_lap_time:
penalty = -10.0 * (self.min_lap_time / max(lap_time, 0.1)) penalty = -10.0 * (self.min_lap_time / max(lap_time, 0.1))
self._prev_speed = speed
return penalty, True return penalty, True
# --- Efficiency gate: detect circular driving --- # --- Efficiency gate: detect circular driving ---
@ -243,7 +338,9 @@ class SpeedRewardWrapper(gym.Wrapper):
if efficiency < self.min_efficiency: if efficiency < self.min_efficiency:
self._low_eff_steps += 1 self._low_eff_steps += 1
if self._low_eff_steps >= self.efficiency_patience: if self._low_eff_steps >= self.efficiency_patience:
self._prev_speed = speed
return -1.0, True # circle too long — terminate return -1.0, True # circle too long — terminate
self._prev_speed = speed
return 0.0, False # still accumulating — zero reward return 0.0, False # still accumulating — zero reward
else: else:
self._low_eff_steps = 0 self._low_eff_steps = 0
@ -252,13 +349,9 @@ class SpeedRewardWrapper(gym.Wrapper):
cte_quality = 1.0 - min(abs(cte) / self.max_cte, 1.0) cte_quality = 1.0 - min(abs(cte) / self.max_cte, 1.0)
# --- Speed --- # --- Speed ---
try:
speed = max(0.0, float(info.get('speed', 0.0) or 0.0))
except (TypeError, ValueError):
speed = 0.0
# --- v6 reward: speed × CTE quality --- # --- v6 reward: speed × CTE quality ---
speed_norm = min(speed / 10.0, 1.0) speed_norm = min(speed / 10.0, 1.0)
self._prev_speed = speed
return cte_quality * speed_norm, False return cte_quality * speed_norm, False
def _compute_efficiency(self) -> float: def _compute_efficiency(self) -> float:

View File

@ -324,12 +324,45 @@ def test_sustained_high_cte_terminates_episode():
rewards.append(r) rewards.append(r)
terminated.append(force_term) terminated.append(force_term)
# Should terminate at step 5 (cte_patience=5) # High CTE should be punished immediately, then terminate at step 5
assert rewards[0] < 0, f'High CTE should be negative immediately, got {rewards[0]}'
assert terminated[4] == True, f'Should force-terminate at step 5, got {terminated}' assert terminated[4] == True, f'Should force-terminate at step 5, got {terminated}'
assert rewards[4] == -1.0, f'Termination reward should be -1.0, got {rewards[4]}' assert rewards[4] == -1.0, f'Termination reward should be -1.0, got {rewards[4]}'
assert terminated[0] == False, 'Should not terminate at step 1' assert terminated[0] == False, 'Should not terminate at step 1'
def test_high_cte_never_gets_positive_speed_reward_before_termination():
"""
Regression for generated_road outside-circle exploit: while CTE is outside
the allowed corridor, the wrapper must not pay positive speed reward during
the patience window. The policy should receive negative feedback
immediately, then termination.
"""
env = MockEnv(speed=5.0, cte=3.0)
wrapper = SpeedRewardWrapper(env, max_cte_terminate=2.5, cte_patience=3)
wrapper.reset()
rewards = []
terminated = []
for i in range(3):
info = {
'cte': 3.0,
'speed': 5.0,
'pos': (float(i), 0.0, 0.0),
'active_node': i,
'total_nodes': 100,
'lap_count': 0,
'last_lap_time': 0.0,
}
r, ft = wrapper._compute_reward_and_done(done=False, info=info)
rewards.append(r)
terminated.append(ft)
assert rewards[:2] == [-0.25, -0.25]
assert rewards[2] == -1.0
assert terminated == [False, False, True]
def test_high_cte_resets_when_back_on_track(): def test_high_cte_resets_when_back_on_track():
""" """
High CTE counter must reset when car returns to track. High CTE counter must reset when car returns to track.
@ -383,6 +416,70 @@ def test_no_track_progress_terminates_episode():
assert r == -1.0 assert r == -1.0
def test_low_speed_no_displacement_terminates_barrier_wedge():
"""
Regression for invisible-barrier wedge: wheels can be commanded but the car
remains nearly motionless with acceptable CTE. This must terminate quickly
instead of returning zero/positive reward indefinitely.
"""
env = MockEnv(speed=0.05, cte=0.5)
wrapper = SpeedRewardWrapper(
env,
low_speed_grace_steps=2,
low_speed_patience=3,
low_speed_threshold=0.2,
low_speed_min_displacement=0.25,
progress_patience=100,
)
wrapper.reset()
terminated = False
reward = None
for _ in range(8):
info = {
'cte': 0.5,
'speed': 0.05,
'pos': (1.0, 0.0, 1.0),
'active_node': 5,
'total_nodes': 100,
'lap_count': 0,
'last_lap_time': 0.0,
}
reward, terminated = wrapper._compute_reward_and_done(done=False, info=info)
if terminated:
break
assert terminated is True
assert reward == -1.0
def test_low_speed_counter_resets_after_meaningful_displacement():
"""Slow starts should not terminate if the car is still changing position."""
env = MockEnv(speed=0.05, cte=0.5)
wrapper = SpeedRewardWrapper(
env,
low_speed_grace_steps=0,
low_speed_patience=3,
low_speed_threshold=0.2,
low_speed_min_displacement=0.25,
progress_patience=100,
)
wrapper.reset()
for i in range(6):
info = {
'cte': 0.5,
'speed': 0.05,
'pos': (float(i) * 0.3, 0.0, 0.0),
'active_node': i,
'total_nodes': 100,
'lap_count': 0,
'last_lap_time': 0.0,
}
reward, terminated = wrapper._compute_reward_and_done(done=False, info=info)
assert terminated is False
def test_track_progress_resets_counter(): def test_track_progress_resets_counter():
""" """
Advancing to a new max active_node must reset the no-progress counter. Advancing to a new max active_node must reset the no-progress counter.