adobe-to-docusign-migrator/docs/agent-harness
Paul Huliganga 51f532f452 feat: idempotent upload + FastAPI web UI with full test coverage
Phase 1 — Idempotent upload:
- upload_docusign_template.py now upserts: PUT if template with same name
  exists (most recently modified), POST otherwise
- --force-create flag to bypass upsert

Phase 2-6 — FastAPI web UI:
- web/app.py: FastAPI app with /health, static file serving
- web/routers/auth.py: Adobe Sign + DocuSign OAuth start/callback/disconnect
- web/routers/templates.py: template listing + migration status badges
  (not_migrated / migrated / needs_update)
- web/routers/migrate.py: POST /api/migrate pipeline + GET /api/migrate/history
- web/static/: vanilla HTML/CSS/JS side-by-side template browser UI

Phase 7 — Tests (29/29 passing):
- test_upload_upsert.py: 4 upsert unit tests
- test_api_health/auth/templates/migrate.py: full API coverage
- test_e2e.py: 7-step full pipeline end-to-end test
- test_regression.py: compose output vs snapshots for 3 real templates
- conftest.py: --update-snapshots CLI option

Docs: IMPLEMENTATION-PLAN.md, updated EXECUTION-BOARD.md + architecture.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:47:27 -04:00
..
AGENT-INSTRUCTIONS.md Initial project scaffold (Cleo) 2026-04-14 19:21:17 -04:00
EXECUTION-BOARD-TEMPLATE.md Initial project scaffold (Cleo) 2026-04-14 19:21:17 -04:00
EXECUTION-BOARD.md feat: idempotent upload + FastAPI web UI with full test coverage 2026-04-17 14:47:27 -04:00
README.md Initial project scaffold (Cleo) 2026-04-14 19:21:17 -04:00
SPEC-CREATION-GUIDE.md Initial project scaffold (Cleo) 2026-04-14 19:21:17 -04:00

README.md

Agent Harness Templates

A complete system for running autonomous AI coding agents on complex projects.

Files

Core Templates (copy into your project)

File Purpose
AGENT.md The agent's "system prompt" — reads this every iteration. Defines the core loop, mandatory pre-commit checklist (tests + TypeScript), commit attribution format, Tests-Added rule, and known anti-patterns
PROJECT-SPEC.md Template for defining your problem. Sections for: overview, tech stack, requirements with acceptance criteria, data model, API design, constraints, phasing, anti-patterns
DECISIONS.md Architecture Decision Record (ADR) template for documenting non-obvious technical choices. Prevents agent drift by creating continuity across fresh contexts
EXECUTION-BOARD-TEMPLATE.md New. Pre-implementation planning artifact for a stream. Defines ALL packets, known-answer tests, and acceptance criteria BEFORE any code is written. The core of the plan-then-implement discipline.
VALIDATION-TEMPLATE.md New. Per-packet evidence file written after each packet completes. Records test counts, known-answer results, and acceptance criteria tick-off.
PROCESS-EVAL-TEMPLATE.md New. Stream retrospective written after merge. Honest assessment of task sizing, test-first compliance, and model quality.
TASK-SPEC-TEMPLATE.md Reusable pre-delegation contract for non-trivial tasks. Defines objective, acceptance criteria, constraints, boundaries, verification, and proof artifact before work starts.
ralph-loop.sh The Ralph Wiggum bash loop — spawns fresh agent instances, checks for completion signals, restarts until done. Supports Claude, Codex, Aider, Gemini, and custom agents
model-report.ts Parses git log Agent: trailers to generate per-model quality table (commits, tests added, TypeScript errors). Copy to scripts/model-report.ts, add "model-report": "ts-node scripts/model-report.ts" to package.json
scaffold-project.sh Helper script to scaffold a new simple or large project with core harness files, starter docs, and optional .harness/ structure.
PROJECT-KICKOFF.md Project-local kickoff checklist template to confirm spec, tooling, evals, and runtime choices are ready before implementation begins.
GAP-AUDIT-2026-04-04.md Point-in-time audit of the harness. Documents current strengths, gaps, priorities, and the consolidation work package.

Process Guides (read before you start)

File Purpose
SPEC-CREATION-GUIDE.md Start here. How to create a great spec through structured interview. The interview protocol, domain knowledge extraction, and spec quality checklist
TUTORIAL.md Best way to learn. Complete 30-minute walkthrough building a markdown link checker CLI tool from zero. Concrete, copy-pasteable example of the entire workflow
GETTING-STARTED.md Practical startup/scaffolding guide for real projects: create project root, copy templates, choose harness mode, scaffold .harness/, and start the first loop cleanly.
CURRENT-STATE.md One-page executive summary of the harness: what is mature, what improved recently, and what should be improved next.
WAVE-BASED-MANAGEMENT.md New. How to structure larger projects into waves, streams, and packets. The plan-then-implement discipline, execution boards, known-answer tests, and wave gates. Essential for projects with 10+ tasks.
PLAN-MANAGEMENT.md How the IMPLEMENTATION_PLAN.md works — the living document agents update. Task decomposition patterns, intervention strategies, progress tracking
REVIEW-AND-QA.md How to evaluate agent output. When to review, what to look for, how to course-correct. Review checklist template including model attribution and TypeScript hygiene checks
EVAL-INFRASTRUCTURE.md Consolidated guide to the harness eval stack: implementation correctness, domain correctness, regression protection, and process quality.
POST-RUN-VALIDATION.md How the harness decides work is really done after execution. Especially important for script-orchestrated runtimes that must not trust agent self-reporting blindly.
SUPERVISION.md Optional operations layer for unattended Ralph runs. Covers supervisor/watchdog patterns, state files, and audit trails for long-running script-orchestrated sessions.
WORKFLOW-SEAMS.md Map of the handoffs between spec, plan, execution boards, validation evidence, review, process evals, and runtime orchestration.
WORKFLOW-DIAGRAM.md Visual map of the harness showing project phases, where each document matters most, and how the artifacts connect across the lifecycle.
COST-OPTIMIZATION.md Getting more work per dollar. Request-based vs token-based billing, optimal strategies per provider, model selection guide, the hybrid strategy, anti-patterns
OPENCLAW-INTEGRATION.md Running the harness in OpenClaw with sessions_spawn, cron jobs, and shell scripts. Model selection, monitoring, cost optimization
TROUBLESHOOTING.md When things go wrong. The five failure modes (stuck loop, drift, overengineering, test theater, context overflow) and how to fix each
PARALLEL-AGENTS.md Running multiple agents simultaneously on independent tasks. When to parallelize, how to split work, how to merge results, conflict resolution, OpenClaw patterns

Examples & Reference

File Purpose
EXAMPLES.md Worked example: Fintrove-style finance app spec + comparison of three approaches (Ezward, Ralph Wiggum, Nate Jones)
CHANGELOG.md Version history and evolution of the agent harness project itself

Quick Start

Runtime Models

There are two different harness runtime models in this system, and it helps to keep them separate:

1. Agent-Orchestrated Runtime

This is the OpenClaw/manual-orchestration model.

  • a supervising agent decides what to run next
  • that agent can inspect execution boards, validation evidence, git history, and prior results
  • that agent can spawn sub-agents, review outcomes, and adapt the workflow dynamically

Use this when:

  • you want a smart orchestrator in the loop
  • you want sub-agent fan-out
  • you want richer judgment between iterations

Primary guide:

  • OPENCLAW-INTEGRATION.md

2. Script-Orchestrated Runtime

This is the ralph-loop.sh model.

  • the shell script is the orchestrator
  • the script must interpret completion/stuck/error signals itself
  • any judgment the supervising agent would normally provide must be encoded into runtime checks

Use this when:

  • you want a portable terminal-native loop
  • you want tmux/background shell operation
  • you want minimal dependencies beyond the CLI agent itself

Important implication:

  • if the script is the orchestrator, reliability has to come from explicit checks, not from assuming the agent will always judge correctly
  • for long unattended runs, add a separate supervisor/watchdog layer rather than assuming tmux alone is sufficient

New to the Harness? (Start Here)

  1. Read CURRENT-STATE.md — understand what the harness is good at right now
  2. Read WORKFLOW-DIAGRAM.md — get the phase map before diving into details
  3. Read GETTING-STARTED.md — scaffold a real project cleanly
  4. Use scaffold-project.sh or new-harness-project if you want the fastest reliable setup
  5. Read TUTORIAL.md — 30-minute hands-on walkthrough building a real CLI tool
  6. Read SPEC-CREATION-GUIDE.md — learn the interview protocol
  7. Read TASK-SPEC-TEMPLATE.md — learn the packet-sized contract for non-trivial delegation
  8. Use PROJECT-KICKOFF.md in your new project as the readiness checklist
  9. Try it — build your own project using the workflow

Ready to Build? (Simple project, <10 tasks)

  1. Read COST-OPTIMIZATION.md — understand your billing model before you start burning budget
  2. Interview — work with your agent to create the spec (or do it solo)
  3. Fill out PROJECT-SPEC.md with your problem definition
  4. Read EVAL-INFRASTRUCTURE.md if the project has calculations, regulated logic, or other high-cost-to-be-wrong behavior
  5. Read POST-RUN-VALIDATION.md if the runtime will need to validate task completion mechanically
  6. Read SUPERVISION.md if the script-orchestrated runtime will run unattended for hours
  7. Copy PROJECT-SPEC.md, AGENT.md, and DECISIONS.md into your project root
  8. Choose a runtime:
    • ./ralph-loop.sh for the script-orchestrated model
    • OpenClaw sessions/sub-agents for the agent-orchestrated model
  9. Review at phase boundaries using REVIEW-AND-QA.md checklist
  10. Troubleshoot failures using TROUBLESHOOTING.md

For unattended script-orchestrated runs, consider adding an optional supervisor/watchdog wrapper around ralph-loop.sh so process death, stale waits, and silent stalls can be detected independently of the tmux pane. The optional guide and starter templates live in SUPERVISION.md, supervise-ralph-loop.template.sh, and audit-ralph-loop.template.sh.

Building Something Larger? (10+ tasks, multiple features)

  1. Read WAVE-BASED-MANAGEMENT.md — the plan-then-implement discipline
  2. Read WORKFLOW-SEAMS.md — understand how the harness artifacts hand off to each other
  3. Read EVAL-INFRASTRUCTURE.md — define the eval stack before implementation starts
  4. Read POST-RUN-VALIDATION.md — define how the runtime will decide packet completion is real
  5. Create your IMPLEMENTATION_PLAN.md with all tasks grouped into waves
  6. Create .harness/EXECUTION_MASTER.md — your wave/stream dashboard
  7. For each stream: copy EXECUTION-BOARD-TEMPLATE.md, fill ALL packets before coding any
  8. For non-trivial delegated packets: create a task spec from TASK-SPEC-TEMPLATE.md
  9. After each packet: copy VALIDATION-TEMPLATE.md and fill it in
  10. After each stream: copy PROCESS-EVAL-TEMPLATE.md and write the retrospective
  11. At each wave boundary: run the wave gate checklist before starting the next wave

If you use ralph-loop.sh for a larger project, pass the active board explicitly:

./ralph-loop.sh --board .harness/<stream>/execution-board.md

The Core Insight

All successful agent approaches share the same loop:

Orient (read spec + plan) → Pick ONE task → Build → Test → Commit → Exit → Restart fresh

The spec defines WHAT. The plan tracks WHERE we are. Fresh context each iteration prevents drift. The human reviews and course-corrects.

In the script-orchestrated runtime, some of that review must be encoded into the loop itself. In the agent-orchestrated runtime, a supervising agent can supply more of that judgment dynamically. Task specs improve the preconditions for delegation. Post-run validation improves the postconditions.

See each file for detailed instructions.

When to Use Which Guide

┌─────────────────────────────────────────────────┐
│           "Which guide do I need?"               │
├─────────────────────────────────────────────────┤
│                                                  │
│  Just starting a real project?                   │
│    → GETTING-STARTED.md                         │
│                                                  │
│  Want it scaffolded for you?                     │
│    → scaffold-project.sh / new-harness-project │
│                                                  │
│  Need a kickoff checklist inside the project?    │
│    → PROJECT-KICKOFF.md                         │
│                                                  │
│  Want the one-page status view?                  │
│    → CURRENT-STATE.md                           │
│                                                  │
│  Want the visual phase map?                      │
│    → WORKFLOW-DIAGRAM.md                        │
│                                                  │
│  Want hands-on learning?                         │
│    → TUTORIAL.md (hands-on learning)            │
│                                                  │
│  Creating a spec?                                │
│    → SPEC-CREATION-GUIDE.md (interview)         │
│                                                  │
│  Delegating a non-trivial task?                  │
│    → TASK-SPEC-TEMPLATE.md                      │
│                                                  │
│  Agent is stuck?                                 │
│    → TROUBLESHOOTING.md (failure modes)         │
│                                                  │
│  Reviewing agent output?                         │
│    → REVIEW-AND-QA.md (what to check)           │
│                                                  │
│  Need a full eval strategy?                      │
│    → EVAL-INFRASTRUCTURE.md                     │
│                                                  │
│  Need runtime completion checks?                 │
│    → POST-RUN-VALIDATION.md                     │
│                                                  │
│  Need unattended-run supervision?                │
│    → SUPERVISION.md                             │
│                                                  │
│  Confused about how docs hand off?               │
│    → WORKFLOW-SEAMS.md                          │
│                                                  │
│  Worried about cost?                             │
│    → COST-OPTIMIZATION.md (billing models)      │
│                                                  │
│  Multiple independent features?                  │
│    → PARALLEL-AGENTS.md (coordination)          │
│                                                  │
│  Using OpenClaw?                                 │
│    → OPENCLAW-INTEGRATION.md (sessions_spawn)   │
│                                                  │
│  Agent keeps changing past decisions?            │
│    → DECISIONS.md (ADR template)                │
│                                                  │
│  Want to see it in action?                       │
│    → EXAMPLES.md (real project example)         │
│                                                  │
└─────────────────────────────────────────────────┘

Philosophy

Fresh Context > Long Context

Each iteration starts with a fresh agent. No accumulated confusion, no stale reasoning. The git history and plan file provide continuity.

One Task > Many Tasks

Agents that try to do everything in one session produce spaghetti. Agents that focus on ONE task produce clean commits.

Spec Quality > Agent Quality

A great spec with a mediocre agent beats a vague spec with a great agent. The spec is your leverage point.

Review > Repair

It's easier to review and guide than to debug and fix. Catch drift early through periodic reviews.

Explicit > Implicit

Agents can't read your mind. Write down constraints, anti-patterns, and decisions. What's obvious to you is invisible to the agent.

Contributing

This harness is a living system. If you:

  • Discover new failure modes
  • Develop better patterns
  • Find gaps in the guides
  • Create examples for other project types

Document them and contribute back. The harness improves as we learn what works.

Version

Current version: 2.0.0 (see CHANGELOG.md for history)

License

Public domain. Use it, modify it, share it. No attribution required.


The harness doesn't write code. It creates conditions where agents can write code reliably.