donkeycar-rl-autoresearch/.harness/templates/PROCESS-EVAL-TEMPLATE.md

2.0 KiB

Process Eval Template

Write this file after the stream is fully merged. File location: .harness/<stream>/process-eval.md Be honest — this is a retrospective, not a press release. Future agents and sessions will read this to understand what worked.


Process Eval — [STREAM NAME]

Completed: YYYY-MM-DD Agent: [model name] Packets: [XX-01, XX-02, ...] Tests added: NN total Final test count: NNNN Wall-clock duration: [estimated]


Packet Summary

Packet Est. Effort Actual On Time? Tests Added
XX-01 N sessions N sessions / NN
XX-02 ... ... ... ...

Known-Answer Test Results

Test Expected Actual Pass?
[Description] [value] [value] /

Process Quality Dimensions

Task Sizing

  • Estimate accuracy: [XX%]
  • Packets that overran: [list or "none"]
  • Root cause of overruns: [...]

Test-First Discipline

  • Tests committed same commit as implementation: [XX/NN packets]
  • Patches needed after initial commit: [list or "none"]

Acceptance Criteria Quality

  • Programmatically verifiable criteria: [XX/NN]
  • Criteria that required human judgment: [list or "none"]

Known-Answer Coverage

  • New calculation modules: N
  • Modules with ≥1 known-answer test: N/N
  • Any gaps: [list or "none"]

Architecture Integrity

  • Cross-module import violations: [N]
  • New shared utilities created: [list]

Regression Protection

  • Regression baseline saved: [yes/no — path if yes]

What Went Well

  • [Honest list]

What Was Hard

  • [Honest list — useful for planning the next stream]

What To Do Differently

  • [Actionable changes for next time]

Rejected Approaches Captured

  • [Approach] — rejected because [...] — captured in [spec / ADR / validation / harness docs]
  • [Approach] — rejected because [...] — captured in [...]

Model Attribution

  • Model: [model name]
  • Strengths observed: [...]
  • Weaknesses observed: [...]