donkeycar-rl-autoresearch/.harness/templates/PROCESS-EVAL-TEMPLATE.md

83 lines
2.0 KiB
Markdown

# Process Eval Template
> Write this file after the stream is fully merged.
> File location: `.harness/<stream>/process-eval.md`
> Be honest — this is a retrospective, not a press release.
> Future agents and sessions will read this to understand what worked.
---
# Process Eval — [STREAM NAME]
**Completed:** YYYY-MM-DD
**Agent:** [model name]
**Packets:** [XX-01, XX-02, ...]
**Tests added:** NN total
**Final test count:** NNNN
**Wall-clock duration:** [estimated]
---
## Packet Summary
| Packet | Est. Effort | Actual | On Time? | Tests Added |
|--------|-------------|--------|----------|-------------|
| XX-01 | N sessions | N sessions | ✅/❌ | NN |
| XX-02 | ... | ... | ... | ... |
---
## Known-Answer Test Results
| Test | Expected | Actual | Pass? |
|------|----------|--------|-------|
| [Description] | [value] | [value] | ✅/❌ |
---
## Process Quality Dimensions
### Task Sizing
- Estimate accuracy: [XX%]
- Packets that overran: [list or "none"]
- Root cause of overruns: [...]
### Test-First Discipline
- Tests committed same commit as implementation: [XX/NN packets]
- Patches needed after initial commit: [list or "none"]
### Acceptance Criteria Quality
- Programmatically verifiable criteria: [XX/NN]
- Criteria that required human judgment: [list or "none"]
### Known-Answer Coverage
- New calculation modules: N
- Modules with ≥1 known-answer test: N/N
- Any gaps: [list or "none"]
### Architecture Integrity
- Cross-module import violations: [N]
- New shared utilities created: [list]
### Regression Protection
- Regression baseline saved: [yes/no — path if yes]
---
## What Went Well
- [Honest list]
## What Was Hard
- [Honest list — useful for planning the next stream]
## What To Do Differently
- [Actionable changes for next time]
## Rejected Approaches Captured
- [Approach] — rejected because [...] — captured in [spec / ADR / validation / harness docs]
- [Approach] — rejected because [...] — captured in [...]
## Model Attribution
- Model: [model name]
- Strengths observed: [...]
- Weaknesses observed: [...]