78 lines
4.0 KiB
Markdown
78 lines
4.0 KiB
Markdown
# Implementation Plan — DonkeyCar RL Autoresearch
|
|
|
|
> Agent: read this at the start of every iteration.
|
|
> Pick the first unchecked task in the current active wave.
|
|
> Mark done immediately after commit.
|
|
|
|
---
|
|
|
|
## Wave 1: Real Training Foundation
|
|
**Goal:** Make the inner loop actually train and save models. Produce a real champion model.
|
|
**Gate:** champion model achieves mean_reward > 100 on training track.
|
|
**Status:** 🟠 In progress
|
|
|
|
### Stream 1A: Core Runner Rebuild
|
|
|
|
- [ ] **1A-01** — Rebuild `donkeycar_sb3_runner.py` with real PPO training (`model.learn()`), model save, and proper evaluation (`evaluate_policy()`)
|
|
- [ ] **1A-02** — Add `SpeedRewardWrapper` — reward = `speed * (1 - abs(cte)/max_cte)`; add `--reward-shaping` flag
|
|
- [ ] **1A-03** — Add champion model tracking — write `champion_manifest.json` when new best is found
|
|
- [ ] **1A-04** — Fix autoresearch controller to pass `learning_rate`, `save_dir`, `reward_shaping` args to runner
|
|
|
|
### Stream 1B: Tests
|
|
|
|
- [ ] **1B-01** — Write `tests/test_discretize_action.py` — action encoding, decoding, round-trip
|
|
- [ ] **1B-02** — Write `tests/test_autoresearch_controller.py` — GP fit, UCB computation, param round-trip, champion tracking
|
|
- [ ] **1B-03** — Write `tests/test_runner_integration.py` — mocked sim, training + save + eval cycle
|
|
|
|
### Stream 1C: First Real Autoresearch Run
|
|
|
|
- [ ] **1C-01** — Run 50-trial autoresearch with real PPO training; verify models saved
|
|
- [ ] **1C-02** — Save regression baseline: `champion_reward_phase1.txt`
|
|
- [ ] **1C-03** — Push all results and models to Gitea
|
|
- [ ] **1C-04** — Write Wave 1 process eval
|
|
|
|
---
|
|
|
|
## Wave 2: Multi-Track Generalization
|
|
**Goal:** Champion model drives any track with mean_reward > 50.
|
|
**Gate:** Wave 1 champion achieves mean_reward > 100. Wave 1 process eval complete.
|
|
**Status:** ⏸️ Not started — blocked on Wave 1
|
|
|
|
- [ ] **2-01** — Write `evaluate_champion.py` — load champion model, evaluate on specified track
|
|
- [ ] **2-02** — Implement multi-track training curriculum (train on 2 tracks alternately)
|
|
- [ ] **2-03** — Add domain randomization wrapper (randomize road width, lighting)
|
|
- [ ] **2-04** — Implement convergence detection in autoresearch (stop when GP sigma collapses)
|
|
- [ ] **2-05** — Add automatic Gitea push every N trials
|
|
- [ ] **2-06** — Evaluate champion on unseen track; record generalization gap
|
|
|
|
---
|
|
|
|
## Wave 3: Racing / Speed Optimization
|
|
**Goal:** Fastest possible lap times on any track.
|
|
**Gate:** Wave 2 champion generalizes to ≥1 unseen track (mean_reward > 50).
|
|
**Status:** ⏸️ Not started — blocked on Wave 2
|
|
|
|
- [ ] **3-01** — Implement lap time measurement and logging
|
|
- [ ] **3-02** — Tune reward function for pure speed (aggressive speed weight)
|
|
- [ ] **3-03** — Fine-tuning from champion checkpoint on new tracks
|
|
- [ ] **3-04** — Head-to-head: autoresearch champion vs human-tuned baseline
|
|
- [ ] **3-05** — Research writeup / report
|
|
|
|
---
|
|
|
|
## Completion Signals
|
|
|
|
The agent outputs one of these at the end of each iteration:
|
|
- `<promise>PLANNED</promise>` — just created/updated the plan, ready to implement
|
|
- `<promise>DONE</promise>` — all tasks in current wave complete
|
|
- `<promise>STUCK</promise>` — needs human input (see ESCALATION REQUIRED block if present)
|
|
- `<promise>ERROR</promise>` — unrecoverable error
|
|
|
|
---
|
|
|
|
## Notes
|
|
|
|
- **Random policy data (300 trials):** The existing autoresearch_results.jsonl contains rewards from random-action policy runs. These are valid for n_steer/n_throttle discretization insights but NOT for learning_rate optimization. Do not mix with Phase 1 real training results. Create a separate results file: `autoresearch_results_phase1.jsonl`.
|
|
- **Model storage:** Large CNN models (>100MB) should be excluded from git or use git LFS. Add `agent/models/**/*.zip` to .gitignore if needed, and document download location.
|
|
- **Simulator requirement:** All live training tasks (1C-*) require DonkeyCar sim running on port 9091. Tests (1B-*) do NOT require the simulator.
|