feat(exp22): add solid-hit/wedge/high-CTE exploit fixes and generated-pair warm experiments
- reward_wrapper: detect barrier/wall/tree solid hits, terminate on head-on impact or 4 sustained solid-hit frames; prevents car wedging against invisible barriers - reward_wrapper: add low-speed/wedge termination — kills episode when car is pinned motionless (below threshold, no displacement) after grace period - reward_wrapper: high-CTE exploit fix — return -0.25 immediately when CTE > max_cte_terminate (not after patience), so PPO cannot collect positive speed rewards while driving the large outside-road circle - tests: 23 passing unit tests covering all new termination paths - exp20/21/22: add parallel DummyVecEnv experiments on generated_road+generated_track with warm-start from champion model; exp22 is current active run - SESSION_HANDOFF.md: live handoff doc for next session continuity Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
04d5a10992
commit
138c65270f
63
AGENT.md
63
AGENT.md
|
|
@ -13,6 +13,69 @@ You have full access to the codebase, can run commands, and can modify any file.
|
|||
|
||||
---
|
||||
|
||||
## Donkeycar RL Simulator Startup Rules
|
||||
|
||||
This project repeatedly runs into a Windows Unity PlayerPrefs port collision.
|
||||
Treat this as a standing instruction for every new session that starts or restarts
|
||||
the simulator.
|
||||
|
||||
- Always run two simulator instances from two separate runtime folders:
|
||||
- `C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin`
|
||||
- `C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin - Copy`
|
||||
- Always set the Unity PlayerPrefs registry port before launching each instance,
|
||||
and also pass explicit ports on launch. Do not rely on the simulator default
|
||||
port or saved UI settings.
|
||||
- Launch the main folder with `--port 9091`.
|
||||
- Launch the copy folder with `--port 9093`.
|
||||
- Preferred runtime layout:
|
||||
- main process: `9091`, private API `9092`
|
||||
- copy process: `9093`, private API `9094`
|
||||
- After launch, verify sockets from WSL/Linux before running diagnostics or RL:
|
||||
|
||||
```bash
|
||||
python3 - <<'PY'
|
||||
import socket
|
||||
for p in (9091, 9093):
|
||||
s = socket.socket()
|
||||
s.settimeout(2)
|
||||
try:
|
||||
s.connect(("127.0.0.1", p))
|
||||
print(f"PORT {p}: OK")
|
||||
except Exception as e:
|
||||
print(f"PORT {p}: FAIL {e}")
|
||||
finally:
|
||||
s.close()
|
||||
PY
|
||||
```
|
||||
|
||||
Correct PowerShell launch sequence:
|
||||
|
||||
```powershell
|
||||
$key = 'HKCU:\Software\DonkeyCar\donkey_sim'
|
||||
|
||||
Get-Process donkey_sim -ErrorAction SilentlyContinue | Stop-Process -Force
|
||||
Start-Sleep -Seconds 1
|
||||
|
||||
Set-ItemProperty -Path $key -Name 'port_h2088097884' -Value 9091 -Type DWord
|
||||
Set-ItemProperty -Path $key -Name 'portPrivateAPI_h1325370089' -Value 9092 -Type DWord
|
||||
Start-Process -FilePath 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin\donkey_sim.exe' -ArgumentList '--port','9091' -WorkingDirectory 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin'
|
||||
|
||||
Start-Sleep -Seconds 4
|
||||
|
||||
Set-ItemProperty -Path $key -Name 'port_h2088097884' -Value 9093 -Type DWord
|
||||
Set-ItemProperty -Path $key -Name 'portPrivateAPI_h1325370089' -Value 9094 -Type DWord
|
||||
Start-Process -FilePath 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin - Copy\donkey_sim.exe' -ArgumentList '--port','9093' -WorkingDirectory 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin - Copy'
|
||||
```
|
||||
|
||||
Why: Unity stores the simulator port in Windows PlayerPrefs/registry under the
|
||||
shared `DonkeyCar/donkey_sim` product key, so both copied simulator folders can
|
||||
inherit the same saved port. Command-line `--port` binds the server correctly,
|
||||
but the in-sim UI can still display the saved PlayerPrefs value. Setting
|
||||
PlayerPrefs before each launch makes both the displayed port and the bound port
|
||||
line up.
|
||||
|
||||
---
|
||||
|
||||
## Core Loop
|
||||
|
||||
Every time you start, follow this exact sequence:
|
||||
|
|
|
|||
|
|
@ -0,0 +1,249 @@
|
|||
# RL Donkeycar Session Handoff
|
||||
|
||||
Last updated: 2026-05-05 America/Toronto
|
||||
|
||||
## Autonomy Instruction
|
||||
|
||||
Use this as the standing instruction for follow-on sessions:
|
||||
|
||||
`Continue the Donkeycar RL/sim work autonomously. Rebuild, sync, relaunch, run diagnostics, patch code, and restart experiments as needed. Keep going until you either have a verified fix and a running experiment, or a concrete blocker that truly requires the user. Do not stop just to ask for permission on ordinary reversible steps. Only pause for real risk of data loss, destructive actions, missing credentials/access, or major strategy tradeoffs that require a user decision.`
|
||||
|
||||
If the user says only `continue`, interpret it using the instruction above.
|
||||
|
||||
## Current Goal
|
||||
|
||||
Stabilize the Unity simulator geometry and collision behavior enough that:
|
||||
|
||||
- `generated_road` and `generated_track` both run without bad invisible barrier placement
|
||||
- barrier contacts terminate episodes appropriately
|
||||
- RL can restart from a trustworthy simulator build
|
||||
|
||||
## Important Paths
|
||||
|
||||
Project:
|
||||
|
||||
- `/home/paulh/projects/donkeycar-rl-autoresearch`
|
||||
|
||||
Unity source project:
|
||||
|
||||
- `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim`
|
||||
|
||||
Unity build output:
|
||||
|
||||
- `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Builds/DonkeySimWin`
|
||||
|
||||
Current runtime simulator folders in use:
|
||||
|
||||
- `/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin`
|
||||
- `/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin - Copy`
|
||||
|
||||
## Current RL Experiment Files
|
||||
|
||||
- `agent/experiments/exp21_generated_pair_warm_v4.py`
|
||||
- `agent/experiments/exp22_generated_pair_warm_v6.py`
|
||||
|
||||
Latest model/output folder:
|
||||
|
||||
- `agent/models/exp22-generated-pair-warm-v6`
|
||||
|
||||
Current training run:
|
||||
|
||||
- launched `agent/experiments/exp22_generated_pair_warm_v6.py`
|
||||
- PID file: `agent/models/exp22-generated-pair-warm-v6/current.pid`
|
||||
- current PID at launch time: `609054`
|
||||
- log: `agent/models/exp22-generated-pair-warm-v6/run_2026-05-05_141929_strictcte.log`
|
||||
- startup verified: connected to `localhost:9091` and `localhost:9093`, loaded `generated_road` and `generated_track`, attached warm-start model, reached `Starting training...`
|
||||
|
||||
Latest urgent exploit fix:
|
||||
|
||||
- User observed generated_road still doing the large outside circle exploit.
|
||||
- Stopped the previous run immediately.
|
||||
- Patched `agent/reward_wrapper.py` so high CTE receives negative reward immediately during the patience window instead of falling through to positive speed reward.
|
||||
- Patched `agent/experiments/exp22_generated_pair_warm_v6.py`:
|
||||
- `MAX_CTE_TERMINATE = 2.5`
|
||||
- `CTE_PATIENCE = 3`
|
||||
- Added regression test `test_high_cte_never_gets_positive_speed_reward_before_termination`.
|
||||
- Verified `python3 -m pytest -q tests/test_reward_wrapper.py`: `21 passed`.
|
||||
|
||||
## What Was Learned
|
||||
|
||||
### Training status
|
||||
|
||||
The latest meaningful `exp22` run was poor and should not be resumed as-is.
|
||||
|
||||
From `agent/models/exp22-generated-pair-warm-v6/run_2026-04-28_2132_openfix.log`:
|
||||
|
||||
- best `generated_track` eval reached only about `92` steps
|
||||
- run was not trustworthy due to ongoing barrier-placement concerns
|
||||
|
||||
### Simulator behavior
|
||||
|
||||
- Invisible barriers are collider-only by default, so the user cannot see them in the standalone player
|
||||
- Diagnostic probe showed both tracks could advance from the start before hitting `left_barrier`, so there was no obvious full-width blocker across the road start
|
||||
- User screenshot suggested the car was getting trapped near the shoulder/edge, consistent with barrier corridor too close to the drivable edge
|
||||
- User also reported that barrier contact sometimes blocks the car without promptly ending the episode
|
||||
|
||||
### Collision semantics
|
||||
|
||||
The user does **not** want every barrier brush to terminate the episode.
|
||||
|
||||
Desired behavior:
|
||||
|
||||
- light brush: can continue
|
||||
- sustained contact: terminate
|
||||
- head-on / abrupt stop: terminate quickly
|
||||
|
||||
## Code Changes Already Made
|
||||
|
||||
### Unity / simulator side
|
||||
|
||||
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/RoadBuilder.cs`
|
||||
|
||||
Implemented structural refactor:
|
||||
|
||||
- explicit `closeLoop` support
|
||||
- explicit road-edge generation
|
||||
- barrier edges derived from left/right road edges instead of guessed centerline offset
|
||||
- open tracks do not force wraparound
|
||||
- debug polyline support via gizmos
|
||||
|
||||
Added runtime-visible debug barrier support:
|
||||
|
||||
- `showBarrierMeshes`
|
||||
- `barrierDebugColor`
|
||||
- barrier objects now include `MeshFilter`
|
||||
- optional `MeshRenderer` added for visible translucent barriers
|
||||
|
||||
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scenes/generated_road.unity`
|
||||
|
||||
- `closeLoop = 0`
|
||||
- `doAddBarriers = 1`
|
||||
- `showBarrierMeshes = 1`
|
||||
- pinned road variation arrays to one entry
|
||||
- `roadOffsets.Array.data[0] = 2.2`
|
||||
|
||||
`/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scenes/generated_track.unity`
|
||||
|
||||
- `showBarrierMeshes = 1`
|
||||
- `roadOffsetW = 2.2`
|
||||
- barriers still enabled
|
||||
|
||||
### Python / RL side
|
||||
|
||||
`/home/paulh/projects/donkeycar-rl-autoresearch/agent/reward_wrapper.py`
|
||||
|
||||
Latest intent:
|
||||
|
||||
- do **not** terminate instantly on every barrier hit
|
||||
- terminate on sustained obstacle contact
|
||||
- terminate on head-on style stop
|
||||
|
||||
Current patch in file:
|
||||
|
||||
- tracks `_solid_hit_steps`
|
||||
- tracks `_prev_speed`
|
||||
- classifies solid hits via `hit` containing `barrier`, `wall`, or `tree`
|
||||
- immediate terminate on abrupt speed collapse while colliding
|
||||
- terminate after several consecutive solid-hit frames
|
||||
|
||||
This was meant to replace the too-aggressive “any barrier hit = immediate death” logic.
|
||||
|
||||
## Most Recent Verified Build Status
|
||||
|
||||
Unity batch build for the debug-visible barrier version completed successfully.
|
||||
|
||||
Evidence:
|
||||
|
||||
- build log ended with `Exiting batchmode successfully now!`
|
||||
- return code `0`
|
||||
|
||||
The successful build has now been synced into both `Downloads` runtime folders and both simulators have been relaunched.
|
||||
|
||||
Current verified runtime state:
|
||||
|
||||
- main folder process owns port `9091`
|
||||
- main folder also owns private API port `9092`
|
||||
- copy folder process owns port `9093`
|
||||
- copy folder also owns private API port `9094`
|
||||
- Linux socket probe reported `PORT 9091: OK`, `PORT 9092: OK`, `PORT 9093: OK`, and `PORT 9094: OK`
|
||||
- latest runtime build includes double-sided barrier mesh triangles for visual/debug barrier rendering
|
||||
|
||||
Note: the Windows profile uses shared Unity PlayerPrefs/registry values under `HKCU:\Software\DonkeyCar\donkey_sim`. Explicit `--port` args bind the servers correctly, but the in-sim UI can still show the saved PlayerPrefs value. Before launch, set `port_h2088097884`/`portPrivateAPI_h1325370089` to `9091`/`9092`, start the main sim, then set them to `9093`/`9094` and start the copy. Also keep passing explicit `--port 9091` and `--port 9093`.
|
||||
|
||||
Latest user visual inspection before double-sided patch:
|
||||
|
||||
- `generated_road`: barriers visible on both sides except missing on left side at the very start before the first curve
|
||||
- `generated_track`: barrier visible only on the right/inside side when driving clockwise; no visible left/outside barrier
|
||||
|
||||
Likely diagnosis: barrier mesh was generated as a single-sided vertical plane and the Standard shader culled backfaces, so some debug barrier surfaces existed but were invisible from the road/camera side.
|
||||
|
||||
Latest simulator-side patch:
|
||||
|
||||
- `/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Assets/Scripts/RoadBuilder.cs`
|
||||
- `CreateBarrier(...)` now emits reverse-facing triangles for every barrier quad, making debug barrier meshes visible from both sides
|
||||
- failed attempt: `Unlit/Transparent` made both tracks' barriers black in the standalone player
|
||||
- failed attempt: duplicating reverse-facing triangles made `generated_track` barriers black, likely due coplanar transparent overdraw/z-fighting on the closed/scaled track
|
||||
- current debug barrier mesh is back to one triangle set per quad; material uses `Standard` transparent mode with forced pale fallback color, alpha blend, culling off, and emission enabled so barriers should stay light/translucent while remaining visible from both sides
|
||||
- Unity Windows batch build succeeded after this patch
|
||||
- rebuilt output synced to both runtime folders and relaunched with explicit ports
|
||||
|
||||
## Immediate Next Steps
|
||||
|
||||
1. Monitor current exp22 training log/checkpoints.
|
||||
|
||||
2. Determine:
|
||||
- are barriers too close to the road edge globally?
|
||||
- or only wrong at specific bends / first-corner geometry?
|
||||
|
||||
3. Fix geometry if needed before restarting RL.
|
||||
|
||||
4. Only after geometry is visually verified, restart `exp22` or a successor experiment.
|
||||
|
||||
## Useful Commands
|
||||
|
||||
### Sync latest build into runtime folders
|
||||
|
||||
```bash
|
||||
rsync -a --delete '/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Builds/DonkeySimWin/' '/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin/'
|
||||
rsync -a --delete '/mnt/c/Users/Paul/Documents/projects/sdsandbox/sdsim/Builds/DonkeySimWin/' '/mnt/c/Users/Paul/Downloads/DonkeySimWin/DonkeySimWin - Copy/'
|
||||
```
|
||||
|
||||
### Launch sims from Windows side
|
||||
|
||||
```powershell
|
||||
$key = 'HKCU:\Software\DonkeyCar\donkey_sim'
|
||||
|
||||
Set-ItemProperty -Path $key -Name 'port_h2088097884' -Value 9091 -Type DWord
|
||||
Set-ItemProperty -Path $key -Name 'portPrivateAPI_h1325370089' -Value 9092 -Type DWord
|
||||
Start-Process -FilePath 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin\donkey_sim.exe' -ArgumentList '--port','9091' -WorkingDirectory 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin'
|
||||
|
||||
Start-Sleep -Seconds 4
|
||||
|
||||
Set-ItemProperty -Path $key -Name 'port_h2088097884' -Value 9093 -Type DWord
|
||||
Set-ItemProperty -Path $key -Name 'portPrivateAPI_h1325370089' -Value 9094 -Type DWord
|
||||
Start-Process -FilePath 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin - Copy\donkey_sim.exe' -ArgumentList '--port','9093' -WorkingDirectory 'C:\Users\Paul\Downloads\DonkeySimWin\DonkeySimWin - Copy'
|
||||
```
|
||||
|
||||
### Verify ports
|
||||
|
||||
```bash
|
||||
python3 - <<'PY'
|
||||
import socket
|
||||
for p in (9091, 9093):
|
||||
s = socket.socket()
|
||||
s.settimeout(3)
|
||||
try:
|
||||
s.connect(('127.0.0.1', p))
|
||||
print(f'PORT {p}: OK')
|
||||
except Exception as e:
|
||||
print(f'PORT {p}: FAIL {e}')
|
||||
finally:
|
||||
s.close()
|
||||
PY
|
||||
```
|
||||
|
||||
## Notes for Next Session
|
||||
|
||||
- If the user says `continue`, do not ask broad questions. Start with the immediate next steps above.
|
||||
- Prefer direct verification over more RL training.
|
||||
- Do not restart long training until the user has visually confirmed the debug-visible barriers look correct.
|
||||
|
|
@ -0,0 +1,205 @@
|
|||
"""
|
||||
Exp 20: Parallel DummyVecEnv — 450k steps, rebuilt sim (v5).
|
||||
|
||||
Fixes from Exp 19 (v4 → v5):
|
||||
- progress_patience: 60 → 150 steps.
|
||||
Mountain track hills slow the car to near-throttle-min speed. At ~1 m/s
|
||||
going uphill, the nearest waypoint may not advance for 3-7 seconds. The
|
||||
previous 60-step (~3s) limit caused legitimate uphill driving to be
|
||||
terminated as "no progress". 150 steps (~7.5s at 20fps) covers the
|
||||
longest mountain hill sections without being exploitable.
|
||||
|
||||
New sim fixes (require rebuilt donkey_sim.exe — rebuild done before this run):
|
||||
- Car.cs OnCollisionStay: sustained low-speed barrier/tree contact now
|
||||
keeps hit != "none" so the sim terminates the episode immediately.
|
||||
Previously, hit was cleared every frame so wedged cars ran indefinitely.
|
||||
- RoadBuilder invisible barriers: generated_track now has invisible wall
|
||||
meshes on both sides of the road. Car cannot escape through mesh gaps.
|
||||
Barriers are 3m tall, 0.3m outside the road edge, loop closed at start/finish.
|
||||
|
||||
Everything else identical to Exp 19.
|
||||
|
||||
Setup — TWO rebuilt sim instances required:
|
||||
Sim 1: donkey_sim.exe on port 9091 → generated_track
|
||||
Sim 2: separate copy of donkey_sim.exe on port 9093 → mountain_track
|
||||
"""
|
||||
import sys, os, time
|
||||
sys.path.insert(0, '/home/paulh/projects/donkeycar-rl-autoresearch/agent')
|
||||
|
||||
from multitrack_runner import log, StuckTerminationWrapper
|
||||
from donkeycar_sb3_runner import ThrottleClampWrapper
|
||||
from reward_wrapper import SpeedRewardWrapper
|
||||
from stable_baselines3 import PPO
|
||||
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
|
||||
HOST = 'localhost'
|
||||
THROTTLE_MIN = 0.2
|
||||
LR = 0.000725
|
||||
TOTAL_STEPS = 450_000
|
||||
CHECKPOINT_EVERY = 20_000
|
||||
SAVE_DIR = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5'
|
||||
os.makedirs(SAVE_DIR, exist_ok=True)
|
||||
|
||||
EFFICIENCY_WINDOW = 200
|
||||
MIN_LAP_TIME = 12.0
|
||||
PROGRESS_PATIENCE = 150 # was 60 — mountain hills take up to 7s per waypoint
|
||||
|
||||
|
||||
def make_env(track_id, port):
|
||||
def _init():
|
||||
raw = gym.make(track_id, conf={'host': HOST, 'port': port})
|
||||
env = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
|
||||
env = StuckTerminationWrapper(env, stuck_steps=40, min_displacement=0.5,
|
||||
max_episode_seconds=30.0)
|
||||
env = SpeedRewardWrapper(env, window_size=EFFICIENCY_WINDOW,
|
||||
min_lap_time=MIN_LAP_TIME,
|
||||
progress_patience=PROGRESS_PATIENCE)
|
||||
return env
|
||||
return _init
|
||||
|
||||
|
||||
log('=' * 60)
|
||||
log('Exp 20: Parallel DummyVecEnv — 450k steps (sim rebuild + progress fix)')
|
||||
log(f' Sim 1: {HOST}:9091 → generated_track')
|
||||
log(f' Sim 2: {HOST}:9093 → mountain_track')
|
||||
log(f' throttle_min={THROTTLE_MIN}, lr={LR}, total={TOTAL_STEPS:,}')
|
||||
log(f' Reward: v6 + exploit fix (window={EFFICIENCY_WINDOW}, min_lap={MIN_LAP_TIME}s)')
|
||||
log(f' Stuck termination: 40 steps (~2s), hard cap 30s')
|
||||
log(f' Progress patience: {PROGRESS_PATIENCE} steps (~7.5s at 20fps)')
|
||||
log(f' Checkpoints: every {CHECKPOINT_EVERY:,} steps')
|
||||
log('=' * 60)
|
||||
|
||||
log('Creating DummyVecEnv with two tracks...')
|
||||
env = DummyVecEnv([
|
||||
make_env('donkey-generated-track-v0', 9091),
|
||||
make_env('donkey-mountain-track-v0', 9093),
|
||||
])
|
||||
env = VecTransposeImage(env)
|
||||
log(f' VecEnv num_envs={env.num_envs}, obs={env.observation_space.shape}')
|
||||
|
||||
model = PPO('CnnPolicy', env, learning_rate=LR, verbose=1, device='cpu')
|
||||
log('PPO created. Starting training...')
|
||||
|
||||
best_reward = float('-inf')
|
||||
steps_done = 0
|
||||
|
||||
while steps_done < TOTAL_STEPS:
|
||||
seg_steps = min(CHECKPOINT_EVERY, TOTAL_STEPS - steps_done)
|
||||
model.learn(total_timesteps=seg_steps, reset_num_timesteps=False)
|
||||
steps_done += seg_steps
|
||||
|
||||
ckpt = os.path.join(SAVE_DIR, f'checkpoint_{steps_done:07d}')
|
||||
model.save(ckpt)
|
||||
model.save(os.path.join(SAVE_DIR, 'model'))
|
||||
log(f'[{steps_done:,}/{TOTAL_STEPS:,}] Checkpoint saved: {ckpt}.zip')
|
||||
|
||||
try:
|
||||
obs = env.reset()
|
||||
ep_rewards = np.zeros(env.num_envs)
|
||||
ep_steps = np.zeros(env.num_envs)
|
||||
done_mask = np.zeros(env.num_envs, dtype=bool)
|
||||
for _ in range(2000):
|
||||
action, _ = model.predict(obs, deterministic=True)
|
||||
obs, rewards, dones, infos = env.step(action)
|
||||
for i in range(env.num_envs):
|
||||
if not done_mask[i]:
|
||||
ep_rewards[i] += rewards[i]
|
||||
ep_steps[i] += 1
|
||||
if dones[i]:
|
||||
done_mask[i] = True
|
||||
if done_mask.all():
|
||||
break
|
||||
|
||||
status0 = '✅' if ep_steps[0] >= 2000 else f'❌@{int(ep_steps[0])}'
|
||||
status1 = '✅' if ep_steps[1] >= 2000 else f'❌@{int(ep_steps[1])}'
|
||||
log(f' Eval: gen_track={ep_rewards[0]:.1f}r/{int(ep_steps[0])}s {status0} '
|
||||
f'mountain={ep_rewards[1]:.1f}r/{int(ep_steps[1])}s {status1}')
|
||||
|
||||
total_reward = ep_rewards.sum()
|
||||
if total_reward > best_reward:
|
||||
best_reward = total_reward
|
||||
model.save(os.path.join(SAVE_DIR, 'best_model'))
|
||||
log(f' NEW BEST: {best_reward:.1f} combined reward')
|
||||
except Exception as e:
|
||||
log(f' Eval error: {e}')
|
||||
import traceback; traceback.print_exc()
|
||||
|
||||
model.save(os.path.join(SAVE_DIR, 'model'))
|
||||
log(f'\nTraining complete. Best combined reward: {best_reward:.1f}')
|
||||
|
||||
env.close()
|
||||
time.sleep(5)
|
||||
|
||||
# --- Final eval on all 4 tracks (sequential, port 9091) ---
|
||||
log('\n' + '=' * 60)
|
||||
log('FINAL EVALUATION: best_model on 4 tracks (3 sets each)')
|
||||
log('=' * 60)
|
||||
|
||||
EVAL_TRACKS = [
|
||||
('donkey-generated-track-v0', 'generated_track'),
|
||||
('donkey-mountain-track-v0', 'mountain_track'),
|
||||
('donkey-minimonaco-track-v0', 'mini_monaco'),
|
||||
('donkey-generated-roads-v0', 'generated_road'),
|
||||
]
|
||||
EVAL_PORT = 9091
|
||||
EVAL_SETS = 3
|
||||
EVAL_MAX_STEPS = 2000
|
||||
|
||||
best_model_path = os.path.join(SAVE_DIR, 'best_model.zip')
|
||||
results_by_track = {}
|
||||
|
||||
for track_id, track_name in EVAL_TRACKS:
|
||||
log(f'\n--- {track_name} ---')
|
||||
steps_list = []
|
||||
|
||||
for s in range(1, EVAL_SETS + 1):
|
||||
try:
|
||||
raw = gym.make(track_id, conf={'host': HOST, 'port': EVAL_PORT})
|
||||
inner = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
|
||||
inner = StuckTerminationWrapper(inner, stuck_steps=40, min_displacement=0.5)
|
||||
inner = SpeedRewardWrapper(inner)
|
||||
eval_env = VecTransposeImage(DummyVecEnv([lambda e=inner: e]))
|
||||
|
||||
eval_model = PPO.load(best_model_path, env=eval_env, device='cpu')
|
||||
|
||||
obs = eval_env.reset()
|
||||
total_r, total_s, done = 0.0, 0, False
|
||||
while not done and total_s < EVAL_MAX_STEPS:
|
||||
action, _ = eval_model.predict(obs, deterministic=True)
|
||||
result = eval_env.step(action)
|
||||
if len(result) == 4:
|
||||
obs, r, d, info = result
|
||||
done = bool(d[0])
|
||||
else:
|
||||
obs, r, t, tr, info = result
|
||||
done = bool(t[0] or tr[0])
|
||||
total_r += float(r[0])
|
||||
total_s += 1
|
||||
|
||||
status = '✅' if total_s >= EVAL_MAX_STEPS else f'❌@{total_s}'
|
||||
log(f' Set {s}: {total_r:.1f}r / {total_s}s {status}')
|
||||
steps_list.append(total_s)
|
||||
|
||||
eval_env.close()
|
||||
time.sleep(3)
|
||||
except Exception as e:
|
||||
log(f' Set {s}: ERROR — {e}')
|
||||
steps_list.append(0)
|
||||
time.sleep(3)
|
||||
|
||||
mean_steps = np.mean(steps_list) if steps_list else 0
|
||||
results_by_track[track_name] = steps_list
|
||||
log(f' Mean: {mean_steps:.0f} steps')
|
||||
|
||||
log('\n' + '=' * 60)
|
||||
log('SUMMARY')
|
||||
log('=' * 60)
|
||||
for track_name, steps_list in results_by_track.items():
|
||||
steps_str = '/'.join(str(s) for s in steps_list)
|
||||
mean = np.mean(steps_list)
|
||||
verdict = '✅' if mean >= 1500 else '⚠️' if mean >= 500 else '❌'
|
||||
log(f' {verdict} {track_name:20s}: {steps_str} mean={mean:.0f}')
|
||||
|
||||
log(f'\n=== Exp 20 COMPLETE ===')
|
||||
|
|
@ -0,0 +1,291 @@
|
|||
"""
|
||||
Exp 21: Parallel DummyVecEnv — generated_road + generated_track, warm-started.
|
||||
|
||||
Rationale:
|
||||
- generated_road specialist already exists and drives road markings well.
|
||||
- generated_road and generated_track share the same road semantics.
|
||||
- Background adaptation is the goal here, not mountain physics.
|
||||
|
||||
Design:
|
||||
- Warm-start from Phase 2 champion (generated_road specialist).
|
||||
- Train in parallel on TWO sim instances:
|
||||
Sim 1: generated_road on port 9091
|
||||
Sim 2: generated_track on port 9093
|
||||
- Use the old v4 reward that worked for the flat road tracks.
|
||||
- Keep the wrapper chain minimal: ThrottleClamp + V4 reward only.
|
||||
"""
|
||||
import sys, os, time
|
||||
from collections import deque
|
||||
from datetime import datetime
|
||||
|
||||
sys.path.insert(0, '/home/paulh/projects/donkeycar-rl-autoresearch/agent')
|
||||
|
||||
from donkeycar_sb3_runner import ThrottleClampWrapper
|
||||
from multitrack_runner import StuckTerminationWrapper
|
||||
from stable_baselines3 import PPO
|
||||
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage
|
||||
from stable_baselines3.common.utils import get_schedule_fn
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
|
||||
|
||||
HOST = 'localhost'
|
||||
THROTTLE_MIN = 0.2
|
||||
LR = 0.000225
|
||||
TOTAL_STEPS = 150_000
|
||||
CHECKPOINT_EVERY = 10_000
|
||||
SAVE_DIR = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp21-generated-pair-warm-v4'
|
||||
WARM_PATH = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip'
|
||||
os.makedirs(SAVE_DIR, exist_ok=True)
|
||||
|
||||
|
||||
class V4RewardWrapper(gym.Wrapper):
|
||||
"""
|
||||
v4 reward from the successful flat-road experiments:
|
||||
reward = base_cte * efficiency * (1 + speed_scale * speed)
|
||||
"""
|
||||
def __init__(self, env, speed_scale=0.1, window_size=60,
|
||||
min_efficiency=0.05, max_cte=8.0):
|
||||
super().__init__(env)
|
||||
self.speed_scale = speed_scale
|
||||
self.min_efficiency = min_efficiency
|
||||
self.max_cte = max_cte
|
||||
self._pos_history = deque(maxlen=window_size + 1)
|
||||
|
||||
def reset(self, **kwargs):
|
||||
self._pos_history.clear()
|
||||
return self.env.reset(**kwargs)
|
||||
|
||||
def step(self, action):
|
||||
result = self.env.step(action)
|
||||
if len(result) == 5:
|
||||
obs, _sim_reward, terminated, truncated, info = result
|
||||
done = terminated or truncated
|
||||
else:
|
||||
obs, _sim_reward, done, info = result
|
||||
terminated, truncated = done, False
|
||||
|
||||
reward = self._compute_reward(done, info)
|
||||
|
||||
if len(result) == 5:
|
||||
return obs, reward, terminated, truncated, info
|
||||
return obs, reward, done, info
|
||||
|
||||
def _compute_reward(self, done, info):
|
||||
if done:
|
||||
return -1.0
|
||||
|
||||
pos = info.get('pos', None)
|
||||
if pos is not None:
|
||||
try:
|
||||
self._pos_history.append(np.array(list(pos)[:3], dtype=np.float64))
|
||||
except (TypeError, ValueError):
|
||||
pass
|
||||
|
||||
try:
|
||||
cte = float(info.get('cte', 0.0) or 0.0)
|
||||
except (TypeError, ValueError):
|
||||
cte = 0.0
|
||||
base = 1.0 - min(abs(cte) / self.max_cte, 1.0)
|
||||
|
||||
efficiency = self._compute_efficiency()
|
||||
eff = max(0.0, (efficiency - self.min_efficiency) / (1.0 - self.min_efficiency))
|
||||
|
||||
try:
|
||||
speed = max(0.0, float(info.get('speed', 0.0) or 0.0))
|
||||
except (TypeError, ValueError):
|
||||
speed = 0.0
|
||||
|
||||
return base * eff * (1.0 + self.speed_scale * speed)
|
||||
|
||||
def _compute_efficiency(self):
|
||||
if len(self._pos_history) < 3:
|
||||
return 1.0
|
||||
positions = list(self._pos_history)
|
||||
net = np.linalg.norm(positions[-1] - positions[0])
|
||||
total = sum(
|
||||
np.linalg.norm(positions[i + 1] - positions[i])
|
||||
for i in range(len(positions) - 1)
|
||||
)
|
||||
return float(net / total) if total > 1e-6 else 1.0
|
||||
|
||||
|
||||
def log(msg):
|
||||
print(f'[{datetime.now().strftime("%H:%M:%S")}] {msg}', flush=True)
|
||||
|
||||
|
||||
def make_env(track_id, port):
|
||||
def _init():
|
||||
raw = gym.make(track_id, conf={'host': HOST, 'port': port})
|
||||
env = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
|
||||
env = StuckTerminationWrapper(
|
||||
env,
|
||||
stuck_steps=40,
|
||||
min_displacement=0.5,
|
||||
max_stuck_seconds=12.0,
|
||||
max_episode_seconds=30.0,
|
||||
)
|
||||
env = V4RewardWrapper(env, speed_scale=0.1, window_size=60,
|
||||
min_efficiency=0.05, max_cte=8.0)
|
||||
return env
|
||||
return _init
|
||||
|
||||
|
||||
def make_eval_env(track_id, port):
|
||||
inner = make_env(track_id, port)()
|
||||
return VecTransposeImage(DummyVecEnv([lambda e=inner: e]))
|
||||
|
||||
|
||||
log('=' * 60)
|
||||
log('Exp 21: generated_road + generated_track, warm-started, v4 reward')
|
||||
log(f' Warm start: {WARM_PATH}')
|
||||
log(f' Sim 1: {HOST}:9091 -> generated_road')
|
||||
log(f' Sim 2: {HOST}:9093 -> generated_track')
|
||||
log(f' throttle_min={THROTTLE_MIN}, lr={LR}, total={TOTAL_STEPS:,}')
|
||||
log(' Termination: StuckTerminationWrapper enabled')
|
||||
log(f' Checkpoints: every {CHECKPOINT_EVERY:,} steps')
|
||||
log('=' * 60)
|
||||
|
||||
log('Creating DummyVecEnv with the two road tracks...')
|
||||
env = DummyVecEnv([
|
||||
make_env('donkey-generated-roads-v0', 9091),
|
||||
make_env('donkey-generated-track-v0', 9093),
|
||||
])
|
||||
env = VecTransposeImage(env)
|
||||
log(f' VecEnv num_envs={env.num_envs}, obs={env.observation_space.shape}')
|
||||
|
||||
if not os.path.exists(WARM_PATH):
|
||||
raise FileNotFoundError(WARM_PATH)
|
||||
|
||||
model = PPO.load(WARM_PATH, env=env, device='cpu')
|
||||
model.learning_rate = LR
|
||||
try:
|
||||
model.lr_schedule = get_schedule_fn(LR)
|
||||
except Exception:
|
||||
model.lr_schedule = None
|
||||
try:
|
||||
for pg in model.policy.optimizer.param_groups:
|
||||
pg['lr'] = LR
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
log('Warm-start model attached. Starting training...')
|
||||
|
||||
best_total_steps = float('-inf')
|
||||
best_total_reward = float('-inf')
|
||||
steps_done = 0
|
||||
|
||||
while steps_done < TOTAL_STEPS:
|
||||
seg_steps = min(CHECKPOINT_EVERY, TOTAL_STEPS - steps_done)
|
||||
model.learn(total_timesteps=seg_steps, reset_num_timesteps=False)
|
||||
steps_done += seg_steps
|
||||
|
||||
ckpt = os.path.join(SAVE_DIR, f'checkpoint_{steps_done:07d}')
|
||||
model.save(ckpt)
|
||||
model.save(os.path.join(SAVE_DIR, 'model'))
|
||||
log(f'[{steps_done:,}/{TOTAL_STEPS:,}] Checkpoint saved: {ckpt}.zip')
|
||||
|
||||
try:
|
||||
obs = env.reset()
|
||||
ep_rewards = np.zeros(env.num_envs)
|
||||
ep_steps = np.zeros(env.num_envs)
|
||||
done_mask = np.zeros(env.num_envs, dtype=bool)
|
||||
|
||||
for _ in range(2000):
|
||||
action, _ = model.predict(obs, deterministic=True)
|
||||
obs, rewards, dones, infos = env.step(action)
|
||||
for i in range(env.num_envs):
|
||||
if not done_mask[i]:
|
||||
ep_rewards[i] += rewards[i]
|
||||
ep_steps[i] += 1
|
||||
if dones[i]:
|
||||
done_mask[i] = True
|
||||
if done_mask.all():
|
||||
break
|
||||
|
||||
status0 = '✅' if ep_steps[0] >= 2000 else f'❌@{int(ep_steps[0])}'
|
||||
status1 = '✅' if ep_steps[1] >= 2000 else f'❌@{int(ep_steps[1])}'
|
||||
log(f' Eval: gen_road={ep_rewards[0]:.1f}r/{int(ep_steps[0])}s {status0} '
|
||||
f'gen_track={ep_rewards[1]:.1f}r/{int(ep_steps[1])}s {status1}')
|
||||
|
||||
total_steps_eval = ep_steps.sum()
|
||||
total_reward = ep_rewards.sum()
|
||||
if (total_steps_eval > best_total_steps or
|
||||
(total_steps_eval == best_total_steps and total_reward > best_total_reward)):
|
||||
best_total_steps = total_steps_eval
|
||||
best_total_reward = total_reward
|
||||
model.save(os.path.join(SAVE_DIR, 'best_model'))
|
||||
log(f' NEW BEST: combined steps={int(best_total_steps)} reward={best_total_reward:.1f}')
|
||||
except Exception as e:
|
||||
log(f' Eval error: {e}')
|
||||
import traceback; traceback.print_exc()
|
||||
|
||||
model.save(os.path.join(SAVE_DIR, 'model'))
|
||||
log(f'\nTraining complete. Best combined steps: {int(best_total_steps)}')
|
||||
|
||||
env.close()
|
||||
time.sleep(5)
|
||||
|
||||
log('\n' + '=' * 60)
|
||||
log('FINAL EVALUATION: best_model on generated_road, generated_track, mini_monaco')
|
||||
log('=' * 60)
|
||||
|
||||
EVAL_TRACKS = [
|
||||
('donkey-generated-roads-v0', 'generated_road'),
|
||||
('donkey-generated-track-v0', 'generated_track'),
|
||||
('donkey-minimonaco-track-v0', 'mini_monaco'),
|
||||
]
|
||||
EVAL_PORT = 9091
|
||||
EVAL_SETS = 3
|
||||
EVAL_MAX_STEPS = 2000
|
||||
|
||||
best_model_path = os.path.join(SAVE_DIR, 'best_model.zip')
|
||||
results_by_track = {}
|
||||
|
||||
for track_id, track_name in EVAL_TRACKS:
|
||||
log(f'\n--- {track_name} ---')
|
||||
steps_list = []
|
||||
|
||||
for s in range(1, EVAL_SETS + 1):
|
||||
try:
|
||||
eval_env = make_eval_env(track_id, EVAL_PORT)
|
||||
eval_model = PPO.load(best_model_path, env=eval_env, device='cpu')
|
||||
|
||||
obs = eval_env.reset()
|
||||
total_r, total_s, done = 0.0, 0, False
|
||||
while not done and total_s < EVAL_MAX_STEPS:
|
||||
action, _ = eval_model.predict(obs, deterministic=True)
|
||||
result = eval_env.step(action)
|
||||
if len(result) == 4:
|
||||
obs, r, d, info = result
|
||||
done = bool(d[0])
|
||||
else:
|
||||
obs, r, t, tr, info = result
|
||||
done = bool(t[0] or tr[0])
|
||||
total_r += float(r[0])
|
||||
total_s += 1
|
||||
|
||||
status = '✅' if total_s >= EVAL_MAX_STEPS else f'❌@{total_s}'
|
||||
log(f' Set {s}: {total_r:.1f}r / {total_s}s {status}')
|
||||
steps_list.append(total_s)
|
||||
eval_env.close()
|
||||
time.sleep(3)
|
||||
except Exception as e:
|
||||
log(f' Set {s}: ERROR - {e}')
|
||||
steps_list.append(0)
|
||||
time.sleep(3)
|
||||
|
||||
mean_steps = np.mean(steps_list) if steps_list else 0
|
||||
results_by_track[track_name] = steps_list
|
||||
log(f' Mean: {mean_steps:.0f} steps')
|
||||
|
||||
log('\n' + '=' * 60)
|
||||
log('SUMMARY')
|
||||
log('=' * 60)
|
||||
for track_name, steps_list in results_by_track.items():
|
||||
steps_str = '/'.join(str(s) for s in steps_list)
|
||||
mean = np.mean(steps_list)
|
||||
verdict = '✅' if mean >= 1500 else '⚠️' if mean >= 500 else '❌'
|
||||
log(f' {verdict} {track_name:20s}: {steps_str} mean={mean:.0f}')
|
||||
|
||||
log('\n=== Exp 21 COMPLETE ===')
|
||||
|
|
@ -0,0 +1,258 @@
|
|||
"""
|
||||
Exp 22: Parallel DummyVecEnv — generated_road + generated_track, warm-started.
|
||||
|
||||
Purpose:
|
||||
- Keep the generated_road champion warm-start idea.
|
||||
- Use the full termination stack so wedged cars and circular exploits end fast.
|
||||
- Use the v6 reward wrapper, which explicitly kills no-progress / low-efficiency
|
||||
behaviour instead of merely giving it weak reward.
|
||||
|
||||
Setup:
|
||||
- Sim 1: generated_road on port 9091
|
||||
- Sim 2: generated_track on port 9093
|
||||
- Warm-start from agent/models/champion/model.zip
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime
|
||||
|
||||
sys.path.insert(0, '/home/paulh/projects/donkeycar-rl-autoresearch/agent')
|
||||
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
from stable_baselines3 import PPO
|
||||
from stable_baselines3.common.utils import get_schedule_fn
|
||||
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage
|
||||
|
||||
from donkeycar_sb3_runner import ThrottleClampWrapper
|
||||
from multitrack_runner import StuckTerminationWrapper
|
||||
from reward_wrapper import SpeedRewardWrapper
|
||||
|
||||
|
||||
HOST = 'localhost'
|
||||
THROTTLE_MIN = 0.2
|
||||
LR = 0.000225
|
||||
TOTAL_STEPS = 150_000
|
||||
CHECKPOINT_EVERY = 10_000
|
||||
SAVE_DIR = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6'
|
||||
WARM_PATH = '/home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip'
|
||||
os.makedirs(SAVE_DIR, exist_ok=True)
|
||||
|
||||
EFFICIENCY_WINDOW = 60
|
||||
MIN_EFFICIENCY = 0.15
|
||||
MIN_LAP_TIME = 12.0
|
||||
MAX_CTE_TERMINATE = 2.5
|
||||
CTE_PATIENCE = 3
|
||||
PROGRESS_PATIENCE = 100
|
||||
EFFICIENCY_PATIENCE = 12
|
||||
LOW_SPEED_PATIENCE = 10
|
||||
LOW_SPEED_THRESHOLD = 0.25
|
||||
LOW_SPEED_MIN_DISPLACEMENT = 0.20
|
||||
LOW_SPEED_GRACE_STEPS = 15
|
||||
MAX_STUCK_SECONDS = 3.0
|
||||
MAX_EPISODE_SECONDS = 18.0
|
||||
|
||||
|
||||
def log(msg):
|
||||
print(f'[{datetime.now().strftime("%H:%M:%S")}] {msg}', flush=True)
|
||||
|
||||
|
||||
def make_env(track_id, port):
|
||||
def _init():
|
||||
raw = gym.make(track_id, conf={'host': HOST, 'port': port})
|
||||
env = ThrottleClampWrapper(raw, throttle_min=THROTTLE_MIN)
|
||||
env = StuckTerminationWrapper(
|
||||
env,
|
||||
stuck_steps=40,
|
||||
min_displacement=0.5,
|
||||
max_stuck_seconds=MAX_STUCK_SECONDS,
|
||||
max_episode_seconds=MAX_EPISODE_SECONDS,
|
||||
)
|
||||
env = SpeedRewardWrapper(
|
||||
env,
|
||||
window_size=EFFICIENCY_WINDOW,
|
||||
min_efficiency=MIN_EFFICIENCY,
|
||||
min_lap_time=MIN_LAP_TIME,
|
||||
max_cte_terminate=MAX_CTE_TERMINATE,
|
||||
cte_patience=CTE_PATIENCE,
|
||||
progress_patience=PROGRESS_PATIENCE,
|
||||
efficiency_patience=EFFICIENCY_PATIENCE,
|
||||
low_speed_patience=LOW_SPEED_PATIENCE,
|
||||
low_speed_threshold=LOW_SPEED_THRESHOLD,
|
||||
low_speed_min_displacement=LOW_SPEED_MIN_DISPLACEMENT,
|
||||
low_speed_grace_steps=LOW_SPEED_GRACE_STEPS,
|
||||
)
|
||||
return env
|
||||
return _init
|
||||
|
||||
|
||||
def make_eval_env(track_id, port):
|
||||
inner = make_env(track_id, port)()
|
||||
return VecTransposeImage(DummyVecEnv([lambda e=inner: e]))
|
||||
|
||||
|
||||
log('=' * 60)
|
||||
log('Exp 22: generated_road + generated_track, warm-started, v6 reward')
|
||||
log(f' Warm start: {WARM_PATH}')
|
||||
log(f' Sim 1: {HOST}:9091 -> generated_road')
|
||||
log(f' Sim 2: {HOST}:9093 -> generated_track')
|
||||
log(f' throttle_min={THROTTLE_MIN}, lr={LR}, total={TOTAL_STEPS:,}')
|
||||
log(' Reward: v6 (speed x CTE with progress/efficiency exploit termination)')
|
||||
log(f' Stuck timeout: {MAX_STUCK_SECONDS:.1f}s, hard cap: {MAX_EPISODE_SECONDS:.1f}s')
|
||||
log(f' Progress patience: {PROGRESS_PATIENCE} steps')
|
||||
log(f' Checkpoints: every {CHECKPOINT_EVERY:,} steps')
|
||||
log('=' * 60)
|
||||
|
||||
log('Creating DummyVecEnv with the two road tracks...')
|
||||
env = DummyVecEnv([
|
||||
make_env('donkey-generated-roads-v0', 9091),
|
||||
make_env('donkey-generated-track-v0', 9093),
|
||||
])
|
||||
env = VecTransposeImage(env)
|
||||
log(f' VecEnv num_envs={env.num_envs}, obs={env.observation_space.shape}')
|
||||
|
||||
if not os.path.exists(WARM_PATH):
|
||||
raise FileNotFoundError(WARM_PATH)
|
||||
|
||||
model = PPO.load(WARM_PATH, env=env, device='cpu')
|
||||
model.learning_rate = LR
|
||||
try:
|
||||
model.lr_schedule = get_schedule_fn(LR)
|
||||
except Exception:
|
||||
model.lr_schedule = None
|
||||
try:
|
||||
for pg in model.policy.optimizer.param_groups:
|
||||
pg['lr'] = LR
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
log('Warm-start model attached. Starting training...')
|
||||
|
||||
best_total_steps = float('-inf')
|
||||
best_total_reward = float('-inf')
|
||||
steps_done = 0
|
||||
|
||||
while steps_done < TOTAL_STEPS:
|
||||
seg_steps = min(CHECKPOINT_EVERY, TOTAL_STEPS - steps_done)
|
||||
model.learn(total_timesteps=seg_steps, reset_num_timesteps=False)
|
||||
steps_done += seg_steps
|
||||
|
||||
ckpt = os.path.join(SAVE_DIR, f'checkpoint_{steps_done:07d}')
|
||||
model.save(ckpt)
|
||||
model.save(os.path.join(SAVE_DIR, 'model'))
|
||||
log(f'[{steps_done:,}/{TOTAL_STEPS:,}] Checkpoint saved: {ckpt}.zip')
|
||||
|
||||
try:
|
||||
obs = env.reset()
|
||||
ep_rewards = np.zeros(env.num_envs)
|
||||
ep_steps = np.zeros(env.num_envs)
|
||||
done_mask = np.zeros(env.num_envs, dtype=bool)
|
||||
|
||||
for _ in range(2000):
|
||||
action, _ = model.predict(obs, deterministic=True)
|
||||
obs, rewards, dones, infos = env.step(action)
|
||||
for i in range(env.num_envs):
|
||||
if not done_mask[i]:
|
||||
ep_rewards[i] += rewards[i]
|
||||
ep_steps[i] += 1
|
||||
if dones[i]:
|
||||
done_mask[i] = True
|
||||
if done_mask.all():
|
||||
break
|
||||
|
||||
status0 = '✅' if ep_steps[0] >= 2000 else f'❌@{int(ep_steps[0])}'
|
||||
status1 = '✅' if ep_steps[1] >= 2000 else f'❌@{int(ep_steps[1])}'
|
||||
log(
|
||||
f' Eval: gen_road={ep_rewards[0]:.1f}r/{int(ep_steps[0])}s {status0} '
|
||||
f'gen_track={ep_rewards[1]:.1f}r/{int(ep_steps[1])}s {status1}'
|
||||
)
|
||||
|
||||
total_steps_eval = ep_steps.sum()
|
||||
total_reward = ep_rewards.sum()
|
||||
if (
|
||||
total_steps_eval > best_total_steps
|
||||
or (total_steps_eval == best_total_steps and total_reward > best_total_reward)
|
||||
):
|
||||
best_total_steps = total_steps_eval
|
||||
best_total_reward = total_reward
|
||||
model.save(os.path.join(SAVE_DIR, 'best_model'))
|
||||
log(
|
||||
f' NEW BEST: combined steps={int(best_total_steps)} '
|
||||
f'reward={best_total_reward:.1f}'
|
||||
)
|
||||
except Exception as e:
|
||||
log(f' Eval error: {e}')
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
model.save(os.path.join(SAVE_DIR, 'model'))
|
||||
log(f'\nTraining complete. Best combined steps: {int(best_total_steps)}')
|
||||
|
||||
env.close()
|
||||
time.sleep(5)
|
||||
|
||||
log('\n' + '=' * 60)
|
||||
log('FINAL EVALUATION: best_model on generated_road, generated_track, mini_monaco')
|
||||
log('=' * 60)
|
||||
|
||||
EVAL_TRACKS = [
|
||||
('donkey-generated-roads-v0', 'generated_road'),
|
||||
('donkey-generated-track-v0', 'generated_track'),
|
||||
('donkey-minimonaco-track-v0', 'mini_monaco'),
|
||||
]
|
||||
EVAL_PORT = 9091
|
||||
EVAL_SETS = 3
|
||||
EVAL_MAX_STEPS = 2000
|
||||
|
||||
best_model_path = os.path.join(SAVE_DIR, 'best_model.zip')
|
||||
results_by_track = {}
|
||||
|
||||
for track_id, track_name in EVAL_TRACKS:
|
||||
log(f'\n--- {track_name} ---')
|
||||
steps_list = []
|
||||
|
||||
for s in range(1, EVAL_SETS + 1):
|
||||
try:
|
||||
eval_env = make_eval_env(track_id, EVAL_PORT)
|
||||
eval_model = PPO.load(best_model_path, env=eval_env, device='cpu')
|
||||
|
||||
obs = eval_env.reset()
|
||||
total_r, total_s, done = 0.0, 0, False
|
||||
while not done and total_s < EVAL_MAX_STEPS:
|
||||
action, _ = eval_model.predict(obs, deterministic=True)
|
||||
result = eval_env.step(action)
|
||||
if len(result) == 4:
|
||||
obs, r, d, info = result
|
||||
done = bool(d[0])
|
||||
else:
|
||||
obs, r, t, tr, info = result
|
||||
done = bool(t[0] or tr[0])
|
||||
total_r += float(r[0])
|
||||
total_s += 1
|
||||
|
||||
status = '✅' if total_s >= EVAL_MAX_STEPS else f'❌@{total_s}'
|
||||
log(f' Set {s}: {total_r:.1f}r / {total_s}s {status}')
|
||||
steps_list.append(total_s)
|
||||
|
||||
eval_env.close()
|
||||
time.sleep(3)
|
||||
except Exception as e:
|
||||
log(f' Set {s}: ERROR — {e}')
|
||||
steps_list.append(0)
|
||||
time.sleep(3)
|
||||
|
||||
mean_steps = np.mean(steps_list) if steps_list else 0
|
||||
results_by_track[track_name] = steps_list
|
||||
log(f' Mean: {mean_steps:.0f} steps')
|
||||
|
||||
log('\n' + '=' * 60)
|
||||
log('SUMMARY')
|
||||
log('=' * 60)
|
||||
for track_name, steps_list in results_by_track.items():
|
||||
steps_str = '/'.join(str(s) for s in steps_list)
|
||||
mean = np.mean(steps_list)
|
||||
verdict = '✅' if mean >= 1500 else '⚠️' if mean >= 500 else '❌'
|
||||
log(f' {verdict} {track_name:20s}: {steps_str} mean={mean:.0f}')
|
||||
|
||||
log('\n=== Exp 22 COMPLETE ===')
|
||||
File diff suppressed because it is too large
Load Diff
|
|
@ -0,0 +1,590 @@
|
|||
[14:34:26] ============================================================
|
||||
[14:34:26] Exp 20: Parallel DummyVecEnv — 450k steps (sim rebuild + progress fix)
|
||||
[14:34:26] Sim 1: localhost:9091 → generated_track
|
||||
[14:34:26] Sim 2: localhost:9093 → mountain_track
|
||||
[14:34:26] throttle_min=0.2, lr=0.000725, total=450,000
|
||||
[14:34:26] Reward: v6 + exploit fix (window=200, min_lap=12.0s)
|
||||
[14:34:26] Stuck termination: 40 steps (~2s), hard cap 30s
|
||||
[14:34:26] Progress patience: 150 steps (~7.5s at 20fps)
|
||||
[14:34:26] Checkpoints: every 20,000 steps
|
||||
[14:34:26] ============================================================
|
||||
[14:34:26] Creating DummyVecEnv with two tracks...
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
[14:34:26] VecEnv num_envs=2, obs=(3, 120, 160)
|
||||
Using cpu device
|
||||
[14:34:31] PPO created. Starting training...
|
||||
-----------------------------
|
||||
| time/ | |
|
||||
| fps | 24 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 165 |
|
||||
| total_timesteps | 4096 |
|
||||
-----------------------------
|
||||
----------------------------------------
|
||||
| time/ | |
|
||||
| fps | 18 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 444 |
|
||||
| total_timesteps | 8192 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.14028513 |
|
||||
| clip_fraction | 0.291 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.81 |
|
||||
| explained_variance | -0.18 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.107 |
|
||||
| n_updates | 10 |
|
||||
| policy_gradient_loss | -0.0541 |
|
||||
| std | 0.953 |
|
||||
| value_loss | 0.438 |
|
||||
----------------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 18 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 674 |
|
||||
| total_timesteps | 12288 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.1430203 |
|
||||
| clip_fraction | 0.453 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.73 |
|
||||
| explained_variance | 0.0647 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0709 |
|
||||
| n_updates | 20 |
|
||||
| policy_gradient_loss | -0.0486 |
|
||||
| std | 0.926 |
|
||||
| value_loss | 1.94 |
|
||||
---------------------------------------
|
||||
----------------------------------------
|
||||
| time/ | |
|
||||
| fps | 18 |
|
||||
| iterations | 4 |
|
||||
| time_elapsed | 868 |
|
||||
| total_timesteps | 16384 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.32767397 |
|
||||
| clip_fraction | 0.571 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.62 |
|
||||
| explained_variance | 0.34 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.129 |
|
||||
| n_updates | 30 |
|
||||
| policy_gradient_loss | -0.0851 |
|
||||
| std | 0.856 |
|
||||
| value_loss | 0.175 |
|
||||
----------------------------------------
|
||||
----------------------------------------
|
||||
| time/ | |
|
||||
| fps | 19 |
|
||||
| iterations | 5 |
|
||||
| time_elapsed | 1053 |
|
||||
| total_timesteps | 20480 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.32903564 |
|
||||
| clip_fraction | 0.611 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.46 |
|
||||
| explained_variance | 0.534 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0762 |
|
||||
| n_updates | 40 |
|
||||
| policy_gradient_loss | -0.0877 |
|
||||
| std | 0.788 |
|
||||
| value_loss | 0.206 |
|
||||
----------------------------------------
|
||||
[14:53:33] [20,000/450,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5/checkpoint_0020000.zip
|
||||
[14:53:39] Eval: gen_track=7.3r/88s ❌@88 mountain=5.5r/88s ❌@88
|
||||
[14:53:39] NEW BEST: 12.7 combined reward
|
||||
------------------------------
|
||||
| time/ | |
|
||||
| fps | 44 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 92 |
|
||||
| total_timesteps | 24576 |
|
||||
------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 30 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 271 |
|
||||
| total_timesteps | 28672 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.5804715 |
|
||||
| clip_fraction | 0.666 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.09 |
|
||||
| explained_variance | 0.766 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.109 |
|
||||
| n_updates | 60 |
|
||||
| policy_gradient_loss | -0.0851 |
|
||||
| std | 0.648 |
|
||||
| value_loss | 0.15 |
|
||||
---------------------------------------
|
||||
--------------------------------------
|
||||
| time/ | |
|
||||
| fps | 27 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 448 |
|
||||
| total_timesteps | 32768 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.629732 |
|
||||
| clip_fraction | 0.693 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -1.88 |
|
||||
| explained_variance | 0.759 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.089 |
|
||||
| n_updates | 70 |
|
||||
| policy_gradient_loss | -0.0853 |
|
||||
| std | 0.587 |
|
||||
| value_loss | 0.165 |
|
||||
--------------------------------------
|
||||
----------------------------------------
|
||||
| time/ | |
|
||||
| fps | 26 |
|
||||
| iterations | 4 |
|
||||
| time_elapsed | 613 |
|
||||
| total_timesteps | 36864 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.70558834 |
|
||||
| clip_fraction | 0.699 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -1.68 |
|
||||
| explained_variance | 0.551 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.112 |
|
||||
| n_updates | 80 |
|
||||
| policy_gradient_loss | -0.0853 |
|
||||
| std | 0.529 |
|
||||
| value_loss | 0.268 |
|
||||
----------------------------------------
|
||||
----------------------------------------
|
||||
| time/ | |
|
||||
| fps | 26 |
|
||||
| iterations | 5 |
|
||||
| time_elapsed | 776 |
|
||||
| total_timesteps | 40960 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.67741144 |
|
||||
| clip_fraction | 0.706 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -1.48 |
|
||||
| explained_variance | 0.593 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.106 |
|
||||
| n_updates | 90 |
|
||||
| policy_gradient_loss | -0.0837 |
|
||||
| std | 0.48 |
|
||||
| value_loss | 0.285 |
|
||||
----------------------------------------
|
||||
[15:08:02] [40,000/450,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5/checkpoint_0040000.zip
|
||||
[15:08:09] Eval: gen_track=19.2r/144s ❌@144 mountain=11.6r/144s ❌@144
|
||||
[15:08:09] NEW BEST: 30.8 combined reward
|
||||
------------------------------
|
||||
| time/ | |
|
||||
| fps | 60 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 68 |
|
||||
| total_timesteps | 45056 |
|
||||
------------------------------
|
||||
----------------------------------------
|
||||
| time/ | |
|
||||
| fps | 37 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 221 |
|
||||
| total_timesteps | 49152 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.84428275 |
|
||||
| clip_fraction | 0.711 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -1.09 |
|
||||
| explained_variance | 0.654 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0724 |
|
||||
| n_updates | 110 |
|
||||
| policy_gradient_loss | -0.0718 |
|
||||
| std | 0.394 |
|
||||
| value_loss | 0.386 |
|
||||
----------------------------------------
|
||||
----------------------------------------
|
||||
| time/ | |
|
||||
| fps | 33 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 367 |
|
||||
| total_timesteps | 53248 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.86503875 |
|
||||
| clip_fraction | 0.735 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -0.886 |
|
||||
| explained_variance | 0.775 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0749 |
|
||||
| n_updates | 120 |
|
||||
| policy_gradient_loss | -0.0763 |
|
||||
| std | 0.355 |
|
||||
| value_loss | 0.236 |
|
||||
----------------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 31 |
|
||||
| iterations | 4 |
|
||||
| time_elapsed | 516 |
|
||||
| total_timesteps | 57344 |
|
||||
| train/ | |
|
||||
| approx_kl | 1.0894502 |
|
||||
| clip_fraction | 0.72 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -0.678 |
|
||||
| explained_variance | 0.779 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0494 |
|
||||
| n_updates | 130 |
|
||||
| policy_gradient_loss | -0.0692 |
|
||||
| std | 0.318 |
|
||||
| value_loss | 0.324 |
|
||||
---------------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 30 |
|
||||
| iterations | 5 |
|
||||
| time_elapsed | 667 |
|
||||
| total_timesteps | 61440 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.9834869 |
|
||||
| clip_fraction | 0.737 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -0.454 |
|
||||
| explained_variance | 0.812 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.105 |
|
||||
| n_updates | 140 |
|
||||
| policy_gradient_loss | -0.0659 |
|
||||
| std | 0.283 |
|
||||
| value_loss | 0.263 |
|
||||
---------------------------------------
|
||||
[15:20:45] [60,000/450,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5/checkpoint_0060000.zip
|
||||
[15:20:52] Eval: gen_track=16.7r/135s ❌@135 mountain=11.5r/134s ❌@134
|
||||
------------------------------
|
||||
| time/ | |
|
||||
| fps | 69 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 58 |
|
||||
| total_timesteps | 65536 |
|
||||
------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 40 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 204 |
|
||||
| total_timesteps | 69632 |
|
||||
| train/ | |
|
||||
| approx_kl | 1.0296706 |
|
||||
| clip_fraction | 0.742 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 0.00541 |
|
||||
| explained_variance | 0.847 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0589 |
|
||||
| n_updates | 160 |
|
||||
| policy_gradient_loss | -0.0642 |
|
||||
| std | 0.225 |
|
||||
| value_loss | 0.252 |
|
||||
---------------------------------------
|
||||
----------------------------------------
|
||||
| time/ | |
|
||||
| fps | 35 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 345 |
|
||||
| total_timesteps | 73728 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.91380507 |
|
||||
| clip_fraction | 0.735 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 0.247 |
|
||||
| explained_variance | 0.88 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0869 |
|
||||
| n_updates | 170 |
|
||||
| policy_gradient_loss | -0.0728 |
|
||||
| std | 0.2 |
|
||||
| value_loss | 0.233 |
|
||||
----------------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 33 |
|
||||
| iterations | 4 |
|
||||
| time_elapsed | 492 |
|
||||
| total_timesteps | 77824 |
|
||||
| train/ | |
|
||||
| approx_kl | 1.1527034 |
|
||||
| clip_fraction | 0.751 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 0.476 |
|
||||
| explained_variance | 0.881 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.11 |
|
||||
| n_updates | 180 |
|
||||
| policy_gradient_loss | -0.0653 |
|
||||
| std | 0.178 |
|
||||
| value_loss | 0.204 |
|
||||
---------------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 32 |
|
||||
| iterations | 5 |
|
||||
| time_elapsed | 633 |
|
||||
| total_timesteps | 81920 |
|
||||
| train/ | |
|
||||
| approx_kl | 1.6661448 |
|
||||
| clip_fraction | 0.777 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 0.708 |
|
||||
| explained_variance | 0.949 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.121 |
|
||||
| n_updates | 190 |
|
||||
| policy_gradient_loss | -0.0697 |
|
||||
| std | 0.159 |
|
||||
| value_loss | 0.101 |
|
||||
---------------------------------------
|
||||
[15:33:03] [80,000/450,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5/checkpoint_0080000.zip
|
||||
[15:33:10] Eval: gen_track=22.8r/169s ❌@169 mountain=13.6r/168s ❌@168
|
||||
[15:33:10] NEW BEST: 36.4 combined reward
|
||||
------------------------------
|
||||
| time/ | |
|
||||
| fps | 84 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 48 |
|
||||
| total_timesteps | 86016 |
|
||||
------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 42 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 192 |
|
||||
| total_timesteps | 90112 |
|
||||
| train/ | |
|
||||
| approx_kl | 1.1363616 |
|
||||
| clip_fraction | 0.765 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 1.13 |
|
||||
| explained_variance | 0.741 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0656 |
|
||||
| n_updates | 210 |
|
||||
| policy_gradient_loss | -0.0623 |
|
||||
| std | 0.129 |
|
||||
| value_loss | 0.325 |
|
||||
---------------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 36 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 335 |
|
||||
| total_timesteps | 94208 |
|
||||
| train/ | |
|
||||
| approx_kl | 1.3523921 |
|
||||
| clip_fraction | 0.757 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 1.32 |
|
||||
| explained_variance | 0.772 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0286 |
|
||||
| n_updates | 220 |
|
||||
| policy_gradient_loss | -0.0511 |
|
||||
| std | 0.116 |
|
||||
| value_loss | 0.485 |
|
||||
---------------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 34 |
|
||||
| iterations | 4 |
|
||||
| time_elapsed | 480 |
|
||||
| total_timesteps | 98304 |
|
||||
| train/ | |
|
||||
| approx_kl | 1.1116364 |
|
||||
| clip_fraction | 0.751 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 1.51 |
|
||||
| explained_variance | 0.768 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0579 |
|
||||
| n_updates | 230 |
|
||||
| policy_gradient_loss | -0.0407 |
|
||||
| std | 0.106 |
|
||||
| value_loss | 0.418 |
|
||||
---------------------------------------
|
||||
--------------------------------------
|
||||
| time/ | |
|
||||
| fps | 32 |
|
||||
| iterations | 5 |
|
||||
| time_elapsed | 624 |
|
||||
| total_timesteps | 102400 |
|
||||
| train/ | |
|
||||
| approx_kl | 1.033067 |
|
||||
| clip_fraction | 0.748 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 1.71 |
|
||||
| explained_variance | 0.77 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0622 |
|
||||
| n_updates | 240 |
|
||||
| policy_gradient_loss | -0.0379 |
|
||||
| std | 0.0963 |
|
||||
| value_loss | 0.517 |
|
||||
--------------------------------------
|
||||
[15:45:37] [100,000/450,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5/checkpoint_0100000.zip
|
||||
[15:45:45] Eval: gen_track=19.1r/157s ❌@157 mountain=13.6r/157s ❌@157
|
||||
-------------------------------
|
||||
| time/ | |
|
||||
| fps | 71 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 57 |
|
||||
| total_timesteps | 106496 |
|
||||
-------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 33 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 243 |
|
||||
| total_timesteps | 110592 |
|
||||
| train/ | |
|
||||
| approx_kl | 1.3683245 |
|
||||
| clip_fraction | 0.757 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 2.11 |
|
||||
| explained_variance | 0.805 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0944 |
|
||||
| n_updates | 260 |
|
||||
| policy_gradient_loss | -0.0381 |
|
||||
| std | 0.0785 |
|
||||
| value_loss | 0.404 |
|
||||
---------------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 26 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 459 |
|
||||
| total_timesteps | 114688 |
|
||||
| train/ | |
|
||||
| approx_kl | 1.6867702 |
|
||||
| clip_fraction | 0.786 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 2.24 |
|
||||
| explained_variance | 0.739 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | 0.0131 |
|
||||
| n_updates | 270 |
|
||||
| policy_gradient_loss | 0.00625 |
|
||||
| std | 0.0737 |
|
||||
| value_loss | 0.725 |
|
||||
---------------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 24 |
|
||||
| iterations | 4 |
|
||||
| time_elapsed | 677 |
|
||||
| total_timesteps | 118784 |
|
||||
| train/ | |
|
||||
| approx_kl | 6.1363573 |
|
||||
| clip_fraction | 0.82 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 2.43 |
|
||||
| explained_variance | 0.664 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | 0.0355 |
|
||||
| n_updates | 280 |
|
||||
| policy_gradient_loss | -0.00149 |
|
||||
| std | 0.0674 |
|
||||
| value_loss | 0.697 |
|
||||
---------------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 22 |
|
||||
| iterations | 5 |
|
||||
| time_elapsed | 910 |
|
||||
| total_timesteps | 122880 |
|
||||
| train/ | |
|
||||
| approx_kl | 4.7547264 |
|
||||
| clip_fraction | 0.809 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 2.59 |
|
||||
| explained_variance | 0.663 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | 0.0146 |
|
||||
| n_updates | 290 |
|
||||
| policy_gradient_loss | 0.00373 |
|
||||
| std | 0.0619 |
|
||||
| value_loss | 0.76 |
|
||||
---------------------------------------
|
||||
[16:03:21] [120,000/450,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp20-parallel-450k-v5/checkpoint_0120000.zip
|
||||
[16:03:27] Eval: gen_track=5.7r/63s ❌@63 mountain=4.8r/97s ❌@97
|
||||
-------------------------------
|
||||
| time/ | |
|
||||
| fps | 40 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 101 |
|
||||
| total_timesteps | 126976 |
|
||||
-------------------------------
|
||||
--------------------------------------
|
||||
| time/ | |
|
||||
| fps | 22 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 356 |
|
||||
| total_timesteps | 131072 |
|
||||
| train/ | |
|
||||
| approx_kl | 8.778878 |
|
||||
| clip_fraction | 0.796 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 2.96 |
|
||||
| explained_variance | 0.732 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | 0.00687 |
|
||||
| n_updates | 310 |
|
||||
| policy_gradient_loss | -0.00436 |
|
||||
| std | 0.0509 |
|
||||
| value_loss | 0.332 |
|
||||
--------------------------------------
|
||||
---------------------------------------
|
||||
| time/ | |
|
||||
| fps | 20 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 600 |
|
||||
| total_timesteps | 135168 |
|
||||
| train/ | |
|
||||
| approx_kl | 3.3255148 |
|
||||
| clip_fraction | 0.793 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | 3.16 |
|
||||
| explained_variance | 0.796 |
|
||||
| learning_rate | 0.000725 |
|
||||
| loss | -0.0742 |
|
||||
| n_updates | 320 |
|
||||
| policy_gradient_loss | -0.000784 |
|
||||
| std | 0.0466 |
|
||||
| value_loss | 0.237 |
|
||||
---------------------------------------
|
||||
|
|
@ -0,0 +1,61 @@
|
|||
[20:41:36] ============================================================
|
||||
[20:41:36] Exp 21: generated_road + generated_track, warm-started, v4 reward
|
||||
[20:41:36] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
|
||||
[20:41:36] Sim 1: localhost:9091 -> generated_road
|
||||
[20:41:36] Sim 2: localhost:9093 -> generated_track
|
||||
[20:41:36] throttle_min=0.2, lr=0.000225, total=150,000
|
||||
[20:41:36] Checkpoints: every 10,000 steps
|
||||
[20:41:36] ============================================================
|
||||
[20:41:36] Creating DummyVecEnv with the two road tracks...
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
[20:41:36] VecEnv num_envs=2, obs=(3, 120, 160)
|
||||
[20:41:40] Warm-start model attached. Starting training...
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 28 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 146 |
|
||||
| total_timesteps | 18432 |
|
||||
---------------------------------
|
||||
-----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 19 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 421 |
|
||||
| total_timesteps | 22528 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.015421186 |
|
||||
| clip_fraction | 0.206 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.79 |
|
||||
| explained_variance | -0.236 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | 23.8 |
|
||||
| n_updates | 80 |
|
||||
| policy_gradient_loss | 0.00689 |
|
||||
| std | 0.98 |
|
||||
| value_loss | 67.9 |
|
||||
-----------------------------------------
|
||||
|
|
@ -0,0 +1,63 @@
|
|||
[20:54:28] ============================================================
|
||||
[20:54:28] Exp 21: generated_road + generated_track, warm-started, v4 reward
|
||||
[20:54:28] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
|
||||
[20:54:28] Sim 1: localhost:9091 -> generated_road
|
||||
[20:54:28] Sim 2: localhost:9093 -> generated_track
|
||||
[20:54:28] throttle_min=0.2, lr=0.000225, total=150,000
|
||||
[20:54:28] Checkpoints: every 10,000 steps
|
||||
[20:54:28] ============================================================
|
||||
[20:54:28] Creating DummyVecEnv with the two road tracks...
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
loading scene generated_road
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
loading scene generated_track
|
||||
[20:54:30] VecEnv num_envs=2, obs=(3, 120, 160)
|
||||
[20:54:35] Warm-start model attached. Starting training...
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 25 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 162 |
|
||||
| total_timesteps | 18432 |
|
||||
---------------------------------
|
||||
----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 17 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 461 |
|
||||
| total_timesteps | 22528 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.02005615 |
|
||||
| clip_fraction | 0.244 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.79 |
|
||||
| explained_variance | -1.26 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | 21.8 |
|
||||
| n_updates | 80 |
|
||||
| policy_gradient_loss | 0.0144 |
|
||||
| std | 0.979 |
|
||||
| value_loss | 54.3 |
|
||||
----------------------------------------
|
||||
|
|
@ -0,0 +1,40 @@
|
|||
[21:03:33] ============================================================
|
||||
[21:03:33] Exp 21: generated_road + generated_track, warm-started, v4 reward
|
||||
[21:03:33] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
|
||||
[21:03:33] Sim 1: localhost:9091 -> generated_road
|
||||
[21:03:33] Sim 2: localhost:9093 -> generated_track
|
||||
[21:03:33] throttle_min=0.2, lr=0.000225, total=150,000
|
||||
[21:03:33] Termination: StuckTerminationWrapper enabled
|
||||
[21:03:33] Checkpoints: every 10,000 steps
|
||||
[21:03:33] ============================================================
|
||||
[21:03:33] Creating DummyVecEnv with the two road tracks...
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
[21:03:33] VecEnv num_envs=2, obs=(3, 120, 160)
|
||||
[21:03:37] Warm-start model attached. Starting training...
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 24 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 167 |
|
||||
| total_timesteps | 18432 |
|
||||
---------------------------------
|
||||
|
|
@ -0,0 +1 @@
|
|||
611625
|
||||
|
|
@ -0,0 +1,49 @@
|
|||
/home/paulh/.local/lib/python3.10/site-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.
|
||||
warnings.warn("Unable to import Axes3D. This may be due to multiple versions of "
|
||||
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
|
||||
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
|
||||
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
|
||||
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
|
||||
[21:16:53] ============================================================
|
||||
[21:16:53] Exp 22: generated_road + generated_track, warm-started, v6 reward
|
||||
[21:16:53] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
|
||||
[21:16:53] Sim 1: localhost:9091 -> generated_road
|
||||
[21:16:53] Sim 2: localhost:9093 -> generated_track
|
||||
[21:16:53] throttle_min=0.2, lr=0.000225, total=150,000
|
||||
[21:16:53] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
|
||||
[21:16:53] Stuck timeout: 8.0s, hard cap: 25.0s
|
||||
[21:16:53] Progress patience: 100 steps
|
||||
[21:16:53] Checkpoints: every 10,000 steps
|
||||
[21:16:53] ============================================================
|
||||
[21:16:53] Creating DummyVecEnv with the two road tracks...
|
||||
INFO:gym_donkeycar.core.client:connecting to localhost:9091
|
||||
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:236: UserWarning: [33mWARN: Box low's precision lowered by casting to float32, current low.dtype=float64[0m
|
||||
gym.logger.warn(
|
||||
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:306: UserWarning: [33mWARN: Box high's precision lowered by casting to float32, current high.dtype=float64[0m
|
||||
gym.logger.warn(
|
||||
INFO:gym_donkeycar.envs.donkey_sim:on need car config
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sim started!
|
||||
INFO:gym_donkeycar.core.client:connecting to localhost:9093
|
||||
INFO:gym_donkeycar.envs.donkey_sim:on need car config
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sim started!
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
[21:16:53] VecEnv num_envs=2, obs=(3, 120, 160)
|
||||
|
|
@ -0,0 +1,383 @@
|
|||
[21:23:45] ============================================================
|
||||
[21:23:45] Exp 22: generated_road + generated_track, warm-started, v6 reward
|
||||
[21:23:45] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
|
||||
[21:23:45] Sim 1: localhost:9091 -> generated_road
|
||||
[21:23:45] Sim 2: localhost:9093 -> generated_track
|
||||
[21:23:45] throttle_min=0.2, lr=0.000225, total=150,000
|
||||
[21:23:45] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
|
||||
[21:23:45] Stuck timeout: 8.0s, hard cap: 25.0s
|
||||
[21:23:45] Progress patience: 100 steps
|
||||
[21:23:45] Checkpoints: every 10,000 steps
|
||||
[21:23:45] ============================================================
|
||||
[21:23:45] Creating DummyVecEnv with the two road tracks...
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
loading scene generated_road
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
loading scene generated_track
|
||||
[21:23:47] VecEnv num_envs=2, obs=(3, 120, 160)
|
||||
[21:23:50] Warm-start model attached. Starting training...
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 29 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 139 |
|
||||
| total_timesteps | 18432 |
|
||||
---------------------------------
|
||||
-----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 21 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 378 |
|
||||
| total_timesteps | 22528 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.024176385 |
|
||||
| clip_fraction | 0.244 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.79 |
|
||||
| explained_variance | -1.36 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | 12.5 |
|
||||
| n_updates | 80 |
|
||||
| policy_gradient_loss | 0.0113 |
|
||||
| std | 0.976 |
|
||||
| value_loss | 41.2 |
|
||||
-----------------------------------------
|
||||
-----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 19 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 616 |
|
||||
| total_timesteps | 26624 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.021042215 |
|
||||
| clip_fraction | 0.227 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.77 |
|
||||
| explained_variance | 0.519 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | 2.82 |
|
||||
| n_updates | 90 |
|
||||
| policy_gradient_loss | 0.00236 |
|
||||
| std | 0.959 |
|
||||
| value_loss | 9.14 |
|
||||
-----------------------------------------
|
||||
[21:35:50] [10,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0010000.zip
|
||||
[21:35:56] Eval: gen_road=3.0r/64s ❌@64 gen_track=1.1r/63s ❌@63
|
||||
[21:35:56] NEW BEST: combined steps=127 reward=4.1
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 31 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 129 |
|
||||
| total_timesteps | 30720 |
|
||||
---------------------------------
|
||||
-----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 22 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 357 |
|
||||
| total_timesteps | 34816 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.027895104 |
|
||||
| clip_fraction | 0.222 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.67 |
|
||||
| explained_variance | 0.27 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | 0.0657 |
|
||||
| n_updates | 110 |
|
||||
| policy_gradient_loss | -0.0236 |
|
||||
| std | 0.907 |
|
||||
| value_loss | 0.549 |
|
||||
-----------------------------------------
|
||||
-----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 20 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 587 |
|
||||
| total_timesteps | 38912 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.038819656 |
|
||||
| clip_fraction | 0.24 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.63 |
|
||||
| explained_variance | 0.346 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | -0.0014 |
|
||||
| n_updates | 120 |
|
||||
| policy_gradient_loss | -0.0293 |
|
||||
| std | 0.893 |
|
||||
| value_loss | 0.157 |
|
||||
-----------------------------------------
|
||||
[21:47:36] [20,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0020000.zip
|
||||
[21:47:42] Eval: gen_road=2.9r/64s ❌@64 gen_track=1.1r/63s ❌@63
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 33 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 122 |
|
||||
| total_timesteps | 43008 |
|
||||
---------------------------------
|
||||
-----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 23 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 351 |
|
||||
| total_timesteps | 47104 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.060704876 |
|
||||
| clip_fraction | 0.327 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.53 |
|
||||
| explained_variance | 0.877 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | -0.0427 |
|
||||
| n_updates | 140 |
|
||||
| policy_gradient_loss | -0.045 |
|
||||
| std | 0.847 |
|
||||
| value_loss | 0.0676 |
|
||||
-----------------------------------------
|
||||
----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 21 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 571 |
|
||||
| total_timesteps | 51200 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.06585144 |
|
||||
| clip_fraction | 0.35 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.49 |
|
||||
| explained_variance | 0.883 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | -0.0429 |
|
||||
| n_updates | 150 |
|
||||
| policy_gradient_loss | -0.0419 |
|
||||
| std | 0.833 |
|
||||
| value_loss | 0.0814 |
|
||||
----------------------------------------
|
||||
[21:58:56] [30,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0030000.zip
|
||||
[21:59:02] Eval: gen_road=2.7r/63s ❌@63 gen_track=1.1r/62s ❌@62
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 33 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 121 |
|
||||
| total_timesteps | 55296 |
|
||||
---------------------------------
|
||||
-----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 23 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 343 |
|
||||
| total_timesteps | 59392 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.096836925 |
|
||||
| clip_fraction | 0.422 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.42 |
|
||||
| explained_variance | 0.85 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | -0.0767 |
|
||||
| n_updates | 170 |
|
||||
| policy_gradient_loss | -0.053 |
|
||||
| std | 0.8 |
|
||||
| value_loss | 0.0973 |
|
||||
-----------------------------------------
|
||||
----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 22 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 556 |
|
||||
| total_timesteps | 63488 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.16407205 |
|
||||
| clip_fraction | 0.461 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.35 |
|
||||
| explained_variance | 0.9 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | -0.0875 |
|
||||
| n_updates | 180 |
|
||||
| policy_gradient_loss | -0.0633 |
|
||||
| std | 0.758 |
|
||||
| value_loss | 0.035 |
|
||||
----------------------------------------
|
||||
[22:10:03] [40,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0040000.zip
|
||||
[22:10:09] Eval: gen_road=3.1r/59s ❌@59 gen_track=1.3r/58s ❌@58
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 36 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 113 |
|
||||
| total_timesteps | 67584 |
|
||||
---------------------------------
|
||||
----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 24 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 329 |
|
||||
| total_timesteps | 71680 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.17689857 |
|
||||
| clip_fraction | 0.489 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.18 |
|
||||
| explained_variance | 0.917 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | -0.0885 |
|
||||
| n_updates | 200 |
|
||||
| policy_gradient_loss | -0.0635 |
|
||||
| std | 0.698 |
|
||||
| value_loss | 0.054 |
|
||||
----------------------------------------
|
||||
---------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 22 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 548 |
|
||||
| total_timesteps | 75776 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.1996874 |
|
||||
| clip_fraction | 0.506 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.08 |
|
||||
| explained_variance | 0.933 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | -0.0906 |
|
||||
| n_updates | 210 |
|
||||
| policy_gradient_loss | -0.0629 |
|
||||
| std | 0.666 |
|
||||
| value_loss | 0.043 |
|
||||
---------------------------------------
|
||||
[22:20:58] [50,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0050000.zip
|
||||
[22:21:04] Eval: gen_road=5.9r/67s ❌@67 gen_track=1.6r/66s ❌@66
|
||||
[22:21:04] NEW BEST: combined steps=133 reward=7.6
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 35 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 115 |
|
||||
| total_timesteps | 79872 |
|
||||
---------------------------------
|
||||
--------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 25 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 326 |
|
||||
| total_timesteps | 83968 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.254287 |
|
||||
| clip_fraction | 0.543 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -1.94 |
|
||||
| explained_variance | 0.89 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | -0.102 |
|
||||
| n_updates | 230 |
|
||||
| policy_gradient_loss | -0.0707 |
|
||||
| std | 0.62 |
|
||||
| value_loss | 0.0646 |
|
||||
--------------------------------------
|
||||
----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 22 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 551 |
|
||||
| total_timesteps | 88064 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.32521772 |
|
||||
| clip_fraction | 0.604 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -1.85 |
|
||||
| explained_variance | 0.803 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | -0.0781 |
|
||||
| n_updates | 240 |
|
||||
| policy_gradient_loss | -0.0776 |
|
||||
| std | 0.594 |
|
||||
| value_loss | 0.102 |
|
||||
----------------------------------------
|
||||
[22:32:03] [60,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0060000.zip
|
||||
[22:32:09] Eval: gen_road=7.7r/93s ❌@93 gen_track=3.7r/92s ❌@92
|
||||
[22:32:09] NEW BEST: combined steps=185 reward=11.4
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 43 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 94 |
|
||||
| total_timesteps | 92160 |
|
||||
---------------------------------
|
||||
|
|
@ -0,0 +1,64 @@
|
|||
/home/paulh/.local/lib/python3.10/site-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.
|
||||
warnings.warn("Unable to import Axes3D. This may be due to multiple versions of "
|
||||
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
|
||||
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
|
||||
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
|
||||
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
|
||||
[14:13:45] ============================================================
|
||||
[14:13:45] Exp 22: generated_road + generated_track, warm-started, v6 reward
|
||||
[14:13:45] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
|
||||
[14:13:45] Sim 1: localhost:9091 -> generated_road
|
||||
[14:13:45] Sim 2: localhost:9093 -> generated_track
|
||||
[14:13:45] throttle_min=0.2, lr=0.000225, total=150,000
|
||||
[14:13:45] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
|
||||
[14:13:45] Stuck timeout: 8.0s, hard cap: 25.0s
|
||||
[14:13:45] Progress patience: 100 steps
|
||||
[14:13:45] Checkpoints: every 10,000 steps
|
||||
[14:13:45] ============================================================
|
||||
[14:13:45] Creating DummyVecEnv with the two road tracks...
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
INFO:gym_donkeycar.core.client:connecting to localhost:9091
|
||||
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:236: UserWarning: [33mWARN: Box low's precision lowered by casting to float32, current low.dtype=float64[0m
|
||||
gym.logger.warn(
|
||||
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:306: UserWarning: [33mWARN: Box high's precision lowered by casting to float32, current high.dtype=float64[0m
|
||||
gym.logger.warn(
|
||||
INFO:gym_donkeycar.envs.donkey_sim:on need car config
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sim started!
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
INFO:gym_donkeycar.core.client:connecting to localhost:9093
|
||||
INFO:gym_donkeycar.envs.donkey_sim:on need car config
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sim started!
|
||||
[14:13:45] VecEnv num_envs=2, obs=(3, 120, 160)
|
||||
/home/paulh/.local/lib/python3.10/site-packages/stable_baselines3/common/utils.py:166: UserWarning: get_schedule_fn() is deprecated, please use FloatSchedule() instead
|
||||
warnings.warn("get_schedule_fn() is deprecated, please use FloatSchedule() instead")
|
||||
/home/paulh/.local/lib/python3.10/site-packages/stable_baselines3/common/utils.py:212: UserWarning: constant_fn() is deprecated, please use ConstantSchedule() instead
|
||||
warnings.warn("constant_fn() is deprecated, please use ConstantSchedule() instead")
|
||||
[14:13:49] Warm-start model attached. Starting training...
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 29 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 139 |
|
||||
| total_timesteps | 18432 |
|
||||
---------------------------------
|
||||
|
|
@ -0,0 +1,64 @@
|
|||
/home/paulh/.local/lib/python3.10/site-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.
|
||||
warnings.warn("Unable to import Axes3D. This may be due to multiple versions of "
|
||||
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
|
||||
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
|
||||
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
|
||||
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
|
||||
[14:19:32] ============================================================
|
||||
[14:19:32] Exp 22: generated_road + generated_track, warm-started, v6 reward
|
||||
[14:19:32] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
|
||||
[14:19:32] Sim 1: localhost:9091 -> generated_road
|
||||
[14:19:32] Sim 2: localhost:9093 -> generated_track
|
||||
[14:19:32] throttle_min=0.2, lr=0.000225, total=150,000
|
||||
[14:19:32] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
|
||||
[14:19:32] Stuck timeout: 8.0s, hard cap: 25.0s
|
||||
[14:19:32] Progress patience: 100 steps
|
||||
[14:19:32] Checkpoints: every 10,000 steps
|
||||
[14:19:32] ============================================================
|
||||
[14:19:32] Creating DummyVecEnv with the two road tracks...
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
INFO:gym_donkeycar.core.client:connecting to localhost:9091
|
||||
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:236: UserWarning: [33mWARN: Box low's precision lowered by casting to float32, current low.dtype=float64[0m
|
||||
gym.logger.warn(
|
||||
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:306: UserWarning: [33mWARN: Box high's precision lowered by casting to float32, current high.dtype=float64[0m
|
||||
gym.logger.warn(
|
||||
INFO:gym_donkeycar.envs.donkey_sim:on need car config
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sim started!
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
INFO:gym_donkeycar.core.client:connecting to localhost:9093
|
||||
INFO:gym_donkeycar.envs.donkey_sim:on need car config
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sim started!
|
||||
[14:19:32] VecEnv num_envs=2, obs=(3, 120, 160)
|
||||
/home/paulh/.local/lib/python3.10/site-packages/stable_baselines3/common/utils.py:166: UserWarning: get_schedule_fn() is deprecated, please use FloatSchedule() instead
|
||||
warnings.warn("get_schedule_fn() is deprecated, please use FloatSchedule() instead")
|
||||
/home/paulh/.local/lib/python3.10/site-packages/stable_baselines3/common/utils.py:212: UserWarning: constant_fn() is deprecated, please use ConstantSchedule() instead
|
||||
warnings.warn("constant_fn() is deprecated, please use ConstantSchedule() instead")
|
||||
[14:19:35] Warm-start model attached. Starting training...
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 22 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 181 |
|
||||
| total_timesteps | 18432 |
|
||||
---------------------------------
|
||||
|
|
@ -0,0 +1,121 @@
|
|||
/home/paulh/.local/lib/python3.10/site-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.
|
||||
warnings.warn("Unable to import Axes3D. This may be due to multiple versions of "
|
||||
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
|
||||
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
|
||||
Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.
|
||||
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
|
||||
[14:26:23] ============================================================
|
||||
[14:26:23] Exp 22: generated_road + generated_track, warm-started, v6 reward
|
||||
[14:26:23] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
|
||||
[14:26:23] Sim 1: localhost:9091 -> generated_road
|
||||
[14:26:23] Sim 2: localhost:9093 -> generated_track
|
||||
[14:26:23] throttle_min=0.2, lr=0.000225, total=150,000
|
||||
[14:26:23] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
|
||||
[14:26:23] Stuck timeout: 3.0s, hard cap: 18.0s
|
||||
[14:26:23] Progress patience: 100 steps
|
||||
[14:26:23] Checkpoints: every 10,000 steps
|
||||
[14:26:23] ============================================================
|
||||
[14:26:23] Creating DummyVecEnv with the two road tracks...
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
INFO:gym_donkeycar.core.client:connecting to localhost:9091
|
||||
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:236: UserWarning: [33mWARN: Box low's precision lowered by casting to float32, current low.dtype=float64[0m
|
||||
gym.logger.warn(
|
||||
/home/paulh/.local/lib/python3.10/site-packages/gymnasium/spaces/box.py:306: UserWarning: [33mWARN: Box high's precision lowered by casting to float32, current high.dtype=float64[0m
|
||||
gym.logger.warn(
|
||||
INFO:gym_donkeycar.envs.donkey_sim:on need car config
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sim started!
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
INFO:gym_donkeycar.core.client:connecting to localhost:9093
|
||||
INFO:gym_donkeycar.envs.donkey_sim:on need car config
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sending car config.
|
||||
INFO:gym_donkeycar.envs.donkey_sim:sim started!
|
||||
[14:26:23] VecEnv num_envs=2, obs=(3, 120, 160)
|
||||
/home/paulh/.local/lib/python3.10/site-packages/stable_baselines3/common/utils.py:166: UserWarning: get_schedule_fn() is deprecated, please use FloatSchedule() instead
|
||||
warnings.warn("get_schedule_fn() is deprecated, please use FloatSchedule() instead")
|
||||
/home/paulh/.local/lib/python3.10/site-packages/stable_baselines3/common/utils.py:212: UserWarning: constant_fn() is deprecated, please use ConstantSchedule() instead
|
||||
warnings.warn("constant_fn() is deprecated, please use ConstantSchedule() instead")
|
||||
[14:26:26] Warm-start model attached. Starting training...
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 23 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 177 |
|
||||
| total_timesteps | 18432 |
|
||||
---------------------------------
|
||||
-----------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 18 |
|
||||
| iterations | 2 |
|
||||
| time_elapsed | 446 |
|
||||
| total_timesteps | 22528 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.012866169 |
|
||||
| clip_fraction | 0.26 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.79 |
|
||||
| explained_variance | -1.1 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | 4.57 |
|
||||
| n_updates | 80 |
|
||||
| policy_gradient_loss | 0.0151 |
|
||||
| std | 0.981 |
|
||||
| value_loss | 25.9 |
|
||||
-----------------------------------------
|
||||
------------------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 17 |
|
||||
| iterations | 3 |
|
||||
| time_elapsed | 714 |
|
||||
| total_timesteps | 26624 |
|
||||
| train/ | |
|
||||
| approx_kl | 0.0133808125 |
|
||||
| clip_fraction | 0.199 |
|
||||
| clip_range | 0.2 |
|
||||
| entropy_loss | -2.81 |
|
||||
| explained_variance | 0.199 |
|
||||
| learning_rate | 0.000225 |
|
||||
| loss | 0.858 |
|
||||
| n_updates | 90 |
|
||||
| policy_gradient_loss | 0.00454 |
|
||||
| std | 0.985 |
|
||||
| value_loss | 3.54 |
|
||||
------------------------------------------
|
||||
[14:39:55] [10,000/150,000] Checkpoint saved: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/exp22-generated-pair-warm-v6/checkpoint_0010000.zip
|
||||
[14:40:01] Eval: gen_road=0.2r/41s ❌@41 gen_track=-0.4r/36s ❌@36
|
||||
[14:40:01] NEW BEST: combined steps=77 reward=-0.3
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 22 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 180 |
|
||||
| total_timesteps | 30720 |
|
||||
---------------------------------
|
||||
|
|
@ -0,0 +1,42 @@
|
|||
[10:19:05] ============================================================
|
||||
[10:19:05] Exp 22: generated_road + generated_track, warm-started, v6 reward
|
||||
[10:19:05] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
|
||||
[10:19:05] Sim 1: localhost:9091 -> generated_road
|
||||
[10:19:05] Sim 2: localhost:9093 -> generated_track
|
||||
[10:19:05] throttle_min=0.2, lr=0.000225, total=150,000
|
||||
[10:19:05] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
|
||||
[10:19:05] Stuck timeout: 8.0s, hard cap: 25.0s
|
||||
[10:19:05] Progress patience: 100 steps
|
||||
[10:19:05] Checkpoints: every 10,000 steps
|
||||
[10:19:05] ============================================================
|
||||
[10:19:05] Creating DummyVecEnv with the two road tracks...
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
[10:19:06] VecEnv num_envs=2, obs=(3, 120, 160)
|
||||
[10:19:09] Warm-start model attached. Starting training...
|
||||
---------------------------------
|
||||
| rollout/ | |
|
||||
| ep_len_mean | 118 |
|
||||
| ep_rew_mean | 102 |
|
||||
| time/ | |
|
||||
| fps | 39 |
|
||||
| iterations | 1 |
|
||||
| time_elapsed | 103 |
|
||||
| total_timesteps | 18432 |
|
||||
---------------------------------
|
||||
|
|
@ -0,0 +1,32 @@
|
|||
[21:17:40] ============================================================
|
||||
[21:17:40] Exp 22: generated_road + generated_track, warm-started, v6 reward
|
||||
[21:17:40] Warm start: /home/paulh/projects/donkeycar-rl-autoresearch/agent/models/champion/model.zip
|
||||
[21:17:40] Sim 1: localhost:9091 -> generated_road
|
||||
[21:17:40] Sim 2: localhost:9093 -> generated_track
|
||||
[21:17:40] throttle_min=0.2, lr=0.000225, total=150,000
|
||||
[21:17:40] Reward: v6 (speed x CTE with progress/efficiency exploit termination)
|
||||
[21:17:40] Stuck timeout: 8.0s, hard cap: 25.0s
|
||||
[21:17:40] Progress patience: 100 steps
|
||||
[21:17:40] Checkpoints: every 10,000 steps
|
||||
[21:17:40] ============================================================
|
||||
[21:17:40] Creating DummyVecEnv with the two road tracks...
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
starting DonkeyGym env
|
||||
Setting default: start_delay 5.0
|
||||
Setting default: max_cte 8.0
|
||||
Setting default: frame_skip 1
|
||||
Setting default: cam_resolution (120, 160, 3)
|
||||
Setting default: log_level 20
|
||||
Setting default: steer_limit 1.0
|
||||
Setting default: throttle_min 0.0
|
||||
Setting default: throttle_max 1.0
|
||||
[21:17:40] VecEnv num_envs=2, obs=(3, 120, 160)
|
||||
[21:17:43] Warm-start model attached. Starting training...
|
||||
|
|
@ -98,6 +98,10 @@ class SpeedRewardWrapper(gym.Wrapper):
|
|||
cte_patience: int = 20,
|
||||
progress_patience: int = 60,
|
||||
efficiency_patience: int = 20, # steps of low efficiency before termination
|
||||
low_speed_patience: int = 20,
|
||||
low_speed_threshold: float = 0.2,
|
||||
low_speed_min_displacement: float = 0.25,
|
||||
low_speed_grace_steps: int = 20,
|
||||
):
|
||||
super().__init__(env)
|
||||
self.speed_scale = speed_scale
|
||||
|
|
@ -109,12 +113,21 @@ class SpeedRewardWrapper(gym.Wrapper):
|
|||
self.cte_patience = cte_patience
|
||||
self.progress_patience = progress_patience
|
||||
self.efficiency_patience = efficiency_patience
|
||||
self.low_speed_patience = low_speed_patience
|
||||
self.low_speed_threshold = low_speed_threshold
|
||||
self.low_speed_min_displacement = low_speed_min_displacement
|
||||
self.low_speed_grace_steps = low_speed_grace_steps
|
||||
self._pos_history = deque(maxlen=window_size + 1)
|
||||
self._last_lap_count = 0
|
||||
self._high_cte_steps = 0
|
||||
self._max_node_seen = -1
|
||||
self._no_progress_steps = 0
|
||||
self._low_eff_steps = 0
|
||||
self._solid_hit_steps = 0
|
||||
self._prev_speed = 0.0
|
||||
self._episode_steps = 0
|
||||
self._low_speed_steps = 0
|
||||
self._low_speed_anchor = None
|
||||
|
||||
def reset(self, **kwargs):
|
||||
result = self.env.reset(**kwargs)
|
||||
|
|
@ -124,6 +137,11 @@ class SpeedRewardWrapper(gym.Wrapper):
|
|||
self._max_node_seen = -1
|
||||
self._no_progress_steps = 0
|
||||
self._low_eff_steps = 0
|
||||
self._solid_hit_steps = 0
|
||||
self._prev_speed = 0.0
|
||||
self._episode_steps = 0
|
||||
self._low_speed_steps = 0
|
||||
self._low_speed_anchor = None
|
||||
return result
|
||||
|
||||
def step(self, action):
|
||||
|
|
@ -168,14 +186,18 @@ class SpeedRewardWrapper(gym.Wrapper):
|
|||
reward = -1.0 (on crash/termination)
|
||||
"""
|
||||
# Track position for efficiency calculation
|
||||
current_pos = None
|
||||
try:
|
||||
pos = info.get('pos', (0.0, 0.0, 0.0))
|
||||
pos_x = float(pos[0])
|
||||
pos_z = float(pos[2])
|
||||
self._pos_history.append(np.array([pos_x, pos_z]))
|
||||
current_pos = np.array([pos_x, pos_z])
|
||||
self._pos_history.append(current_pos)
|
||||
except (TypeError, ValueError, IndexError):
|
||||
pass
|
||||
|
||||
self._episode_steps += 1
|
||||
|
||||
# Crash / episode over
|
||||
if done:
|
||||
return -1.0, False
|
||||
|
|
@ -186,11 +208,82 @@ class SpeedRewardWrapper(gym.Wrapper):
|
|||
except (TypeError, ValueError):
|
||||
cte = 0.0
|
||||
|
||||
# --- Grass exploit: sustained high CTE termination ---
|
||||
# --- Speed / collision classification ---
|
||||
try:
|
||||
speed = max(0.0, float(info.get('speed', 0.0) or 0.0))
|
||||
except (TypeError, ValueError):
|
||||
speed = 0.0
|
||||
|
||||
try:
|
||||
hit = str(info.get('hit', 'none') or 'none').lower()
|
||||
except Exception:
|
||||
hit = 'none'
|
||||
|
||||
solid_hit = (
|
||||
hit != 'none' and (
|
||||
'barrier' in hit or
|
||||
'wall' in hit or
|
||||
'tree' in hit
|
||||
)
|
||||
)
|
||||
|
||||
# Allow brief brushes, but terminate on:
|
||||
# 1. a head-on style stop: car was moving, then collision arrives with
|
||||
# a large speed drop; or
|
||||
# 2. sustained obstacle contact over several telemetry frames.
|
||||
if solid_hit:
|
||||
head_on_impact = self._prev_speed >= 1.5 and speed <= 0.35
|
||||
if head_on_impact:
|
||||
self._prev_speed = speed
|
||||
return -1.0, True
|
||||
|
||||
self._solid_hit_steps += 1
|
||||
if self._solid_hit_steps >= 4:
|
||||
self._prev_speed = speed
|
||||
return -1.0, True
|
||||
else:
|
||||
self._solid_hit_steps = 0
|
||||
|
||||
# --- Wheels-spinning / barrier wedge termination ---
|
||||
# CTE can remain deceptively acceptable when the car is pressed against
|
||||
# a generated-road barrier or invisible collider. If speed stays near
|
||||
# zero and position does not meaningfully change after the launch grace
|
||||
# period, kill the episode quickly with a negative reward.
|
||||
if (
|
||||
current_pos is not None
|
||||
and self._episode_steps > self.low_speed_grace_steps
|
||||
and speed <= self.low_speed_threshold
|
||||
):
|
||||
if self._low_speed_anchor is None:
|
||||
self._low_speed_anchor = current_pos
|
||||
self._low_speed_steps = 1
|
||||
else:
|
||||
moved = float(np.linalg.norm(current_pos - self._low_speed_anchor))
|
||||
if moved >= self.low_speed_min_displacement:
|
||||
self._low_speed_anchor = current_pos
|
||||
self._low_speed_steps = 0
|
||||
else:
|
||||
self._low_speed_steps += 1
|
||||
|
||||
if self._low_speed_steps >= self.low_speed_patience:
|
||||
self._prev_speed = speed
|
||||
return -1.0, True
|
||||
else:
|
||||
self._low_speed_steps = 0
|
||||
self._low_speed_anchor = current_pos
|
||||
|
||||
# --- Grass / outside-road exploit: high CTE is bad immediately ---
|
||||
# Do not let the policy collect positive speed reward while it is
|
||||
# outside the useful road corridor. Earlier versions only terminated
|
||||
# after patience frames, but still paid positive reward during those
|
||||
# frames; PPO learned large fast circles outside generated_road.
|
||||
if abs(cte) > self.max_cte_terminate:
|
||||
self._high_cte_steps += 1
|
||||
if self._high_cte_steps >= self.cte_patience:
|
||||
self._prev_speed = speed
|
||||
return -1.0, True # too long off-track — terminate
|
||||
self._prev_speed = speed
|
||||
return -0.25, False
|
||||
else:
|
||||
self._high_cte_steps = 0
|
||||
|
||||
|
|
@ -214,6 +307,7 @@ class SpeedRewardWrapper(gym.Wrapper):
|
|||
else:
|
||||
self._no_progress_steps += 1
|
||||
if self._no_progress_steps >= self.progress_patience:
|
||||
self._prev_speed = speed
|
||||
return -1.0, True # no forward progress — terminate
|
||||
|
||||
|
||||
|
|
@ -233,6 +327,7 @@ class SpeedRewardWrapper(gym.Wrapper):
|
|||
lap_time = 999.0
|
||||
if lap_time < self.min_lap_time:
|
||||
penalty = -10.0 * (self.min_lap_time / max(lap_time, 0.1))
|
||||
self._prev_speed = speed
|
||||
return penalty, True
|
||||
|
||||
# --- Efficiency gate: detect circular driving ---
|
||||
|
|
@ -243,7 +338,9 @@ class SpeedRewardWrapper(gym.Wrapper):
|
|||
if efficiency < self.min_efficiency:
|
||||
self._low_eff_steps += 1
|
||||
if self._low_eff_steps >= self.efficiency_patience:
|
||||
self._prev_speed = speed
|
||||
return -1.0, True # circle too long — terminate
|
||||
self._prev_speed = speed
|
||||
return 0.0, False # still accumulating — zero reward
|
||||
else:
|
||||
self._low_eff_steps = 0
|
||||
|
|
@ -252,13 +349,9 @@ class SpeedRewardWrapper(gym.Wrapper):
|
|||
cte_quality = 1.0 - min(abs(cte) / self.max_cte, 1.0)
|
||||
|
||||
# --- Speed ---
|
||||
try:
|
||||
speed = max(0.0, float(info.get('speed', 0.0) or 0.0))
|
||||
except (TypeError, ValueError):
|
||||
speed = 0.0
|
||||
|
||||
# --- v6 reward: speed × CTE quality ---
|
||||
speed_norm = min(speed / 10.0, 1.0)
|
||||
self._prev_speed = speed
|
||||
return cte_quality * speed_norm, False
|
||||
|
||||
def _compute_efficiency(self) -> float:
|
||||
|
|
|
|||
|
|
@ -324,12 +324,45 @@ def test_sustained_high_cte_terminates_episode():
|
|||
rewards.append(r)
|
||||
terminated.append(force_term)
|
||||
|
||||
# Should terminate at step 5 (cte_patience=5)
|
||||
# High CTE should be punished immediately, then terminate at step 5
|
||||
assert rewards[0] < 0, f'High CTE should be negative immediately, got {rewards[0]}'
|
||||
assert terminated[4] == True, f'Should force-terminate at step 5, got {terminated}'
|
||||
assert rewards[4] == -1.0, f'Termination reward should be -1.0, got {rewards[4]}'
|
||||
assert terminated[0] == False, 'Should not terminate at step 1'
|
||||
|
||||
|
||||
def test_high_cte_never_gets_positive_speed_reward_before_termination():
|
||||
"""
|
||||
Regression for generated_road outside-circle exploit: while CTE is outside
|
||||
the allowed corridor, the wrapper must not pay positive speed reward during
|
||||
the patience window. The policy should receive negative feedback
|
||||
immediately, then termination.
|
||||
"""
|
||||
env = MockEnv(speed=5.0, cte=3.0)
|
||||
wrapper = SpeedRewardWrapper(env, max_cte_terminate=2.5, cte_patience=3)
|
||||
wrapper.reset()
|
||||
|
||||
rewards = []
|
||||
terminated = []
|
||||
for i in range(3):
|
||||
info = {
|
||||
'cte': 3.0,
|
||||
'speed': 5.0,
|
||||
'pos': (float(i), 0.0, 0.0),
|
||||
'active_node': i,
|
||||
'total_nodes': 100,
|
||||
'lap_count': 0,
|
||||
'last_lap_time': 0.0,
|
||||
}
|
||||
r, ft = wrapper._compute_reward_and_done(done=False, info=info)
|
||||
rewards.append(r)
|
||||
terminated.append(ft)
|
||||
|
||||
assert rewards[:2] == [-0.25, -0.25]
|
||||
assert rewards[2] == -1.0
|
||||
assert terminated == [False, False, True]
|
||||
|
||||
|
||||
def test_high_cte_resets_when_back_on_track():
|
||||
"""
|
||||
High CTE counter must reset when car returns to track.
|
||||
|
|
@ -383,6 +416,70 @@ def test_no_track_progress_terminates_episode():
|
|||
assert r == -1.0
|
||||
|
||||
|
||||
def test_low_speed_no_displacement_terminates_barrier_wedge():
|
||||
"""
|
||||
Regression for invisible-barrier wedge: wheels can be commanded but the car
|
||||
remains nearly motionless with acceptable CTE. This must terminate quickly
|
||||
instead of returning zero/positive reward indefinitely.
|
||||
"""
|
||||
env = MockEnv(speed=0.05, cte=0.5)
|
||||
wrapper = SpeedRewardWrapper(
|
||||
env,
|
||||
low_speed_grace_steps=2,
|
||||
low_speed_patience=3,
|
||||
low_speed_threshold=0.2,
|
||||
low_speed_min_displacement=0.25,
|
||||
progress_patience=100,
|
||||
)
|
||||
wrapper.reset()
|
||||
|
||||
terminated = False
|
||||
reward = None
|
||||
for _ in range(8):
|
||||
info = {
|
||||
'cte': 0.5,
|
||||
'speed': 0.05,
|
||||
'pos': (1.0, 0.0, 1.0),
|
||||
'active_node': 5,
|
||||
'total_nodes': 100,
|
||||
'lap_count': 0,
|
||||
'last_lap_time': 0.0,
|
||||
}
|
||||
reward, terminated = wrapper._compute_reward_and_done(done=False, info=info)
|
||||
if terminated:
|
||||
break
|
||||
|
||||
assert terminated is True
|
||||
assert reward == -1.0
|
||||
|
||||
|
||||
def test_low_speed_counter_resets_after_meaningful_displacement():
|
||||
"""Slow starts should not terminate if the car is still changing position."""
|
||||
env = MockEnv(speed=0.05, cte=0.5)
|
||||
wrapper = SpeedRewardWrapper(
|
||||
env,
|
||||
low_speed_grace_steps=0,
|
||||
low_speed_patience=3,
|
||||
low_speed_threshold=0.2,
|
||||
low_speed_min_displacement=0.25,
|
||||
progress_patience=100,
|
||||
)
|
||||
wrapper.reset()
|
||||
|
||||
for i in range(6):
|
||||
info = {
|
||||
'cte': 0.5,
|
||||
'speed': 0.05,
|
||||
'pos': (float(i) * 0.3, 0.0, 0.0),
|
||||
'active_node': i,
|
||||
'total_nodes': 100,
|
||||
'lap_count': 0,
|
||||
'last_lap_time': 0.0,
|
||||
}
|
||||
reward, terminated = wrapper._compute_reward_and_done(done=False, info=info)
|
||||
assert terminated is False
|
||||
|
||||
|
||||
def test_track_progress_resets_counter():
|
||||
"""
|
||||
Advancing to a new max active_node must reset the no-progress counter.
|
||||
|
|
|
|||
Loading…
Reference in New Issue