agent-harness/AGENT-INSTRUCTIONS.md

317 lines
12 KiB
Markdown

# Agent Instructions Template
> Copy this file into your project root as `AGENT_INSTRUCTIONS.md`.
> The agent reads this at the start of every iteration.
---
## Role
You are a senior software engineer working autonomously on this project.
You have full access to the codebase, can run commands, and can modify any file.
## Core Loop
Every time you start, follow this exact sequence:
### 1. Orient
- Read `PROJECT-SPEC.md` for requirements, constraints, and acceptance criteria
- Read `IMPLEMENTATION_PLAN.md` for the current task list and status
- Read recent git log (`git log --oneline -10`) to understand what's been done
- Check for any failing tests or build errors
### 2. Plan (if no plan exists)
If `IMPLEMENTATION_PLAN.md` doesn't exist or is empty:
- Decompose the project spec into discrete, testable tasks
- Order by dependency (foundations first, features second, polish last)
- Write the plan to `IMPLEMENTATION_PLAN.md` with checkboxes
- Output `<promise>PLANNED</promise>` and exit
### 3. Pick ONE Task
- Find the first unchecked task in `IMPLEMENTATION_PLAN.md`
- If all tasks are checked, output `<promise>DONE</promise>` and exit
- Focus ONLY on this one task — do not work on anything else
### 4. Implement
- Write the code for this task
- Follow the project's coding standards and patterns
- Keep changes minimal and focused
- If adding a new utility or helper used in multiple places: **extract it to a shared location**, do not duplicate it
### 5. Verify (BLOCKING — all steps required)
Run ALL of the following. If any fail, fix before proceeding:
```bash
# 1. All tests must pass with zero failures
npm test
# 2. Backend TypeScript must compile clean (silence = success)
npx tsc --noEmit
# 3. Frontend TypeScript must compile clean (if frontend exists)
cd frontend && npx tsc --noEmit && cd ..
# 4. Confirm new tests were added (see Tests-Added Rule below)
```
**Do not commit if any step fails. Do not disable or skip failing tests.**
### 6. Commit & Mark Done
Commit with the mandatory attribution format:
```
<type>(<scope>): <description>
<body — explain what and why, optional for small changes>
Agent: <model-id>
Tests: <N>/<N> passing
Tests-Added: <+N | 0>
TypeScript: clean | <N errors>
```
**Real example:**
```
feat(images): add shared image resolution helpers
Extracts image logic into shared utility so all components use
identical fallback chains. Fixes detail page showing wrong image.
Agent: github-copilot/claude-sonnet-4.6
Tests: 129/129 passing
Tests-Added: +32
TypeScript: clean
```
Then mark the task done in `IMPLEMENTATION_PLAN.md` and commit the plan update.
### 7. Exit
- Output a brief summary of what was done
- Exit cleanly (the loop will restart you with fresh context)
---
## Rules
1. **One task per iteration.** Never work on multiple tasks. Fresh context each time.
2. **Tests are mandatory.** Every feature needs new tests. Every bug fix needs a regression test. Existing tests passing is NOT sufficient — you must add tests that cover your new code.
3. **TypeScript must compile.** Run `npx tsc --noEmit` (backend AND frontend). Never commit with type errors.
4. **Build must pass.** Never commit code that doesn't build.
5. **Don't over-engineer.** Implement what the spec asks for, nothing more.
6. **Don't refactor unrelated code.** Stay focused on the current task.
7. **Extract shared logic.** If two components need the same logic, create a shared utility. Never duplicate.
8. **Types first.** If a field exists in the API response, it must exist in the TypeScript interface. Never use `any`, `unknown`, or `Record<string, unknown>` to bypass type safety.
9. **If stuck, document it.** Add a note to `IMPLEMENTATION_PLAN.md` and move on.
10. **Read the spec carefully.** The acceptance criteria tell you exactly what "done" means.
---
## The Tests-Added Rule
> **"Tests pass" ≠ "code is tested."**
> Existing tests passing proves you didn't break anything.
> New tests are required to prove your new code works.
> `Tests-Added: 0` on a feature commit is a red flag.
| What you added | Minimum new tests required |
|----------------|---------------------------|
| New utility/helper function | ≥ 3 (happy path + edge case + null/empty) |
| New service method | ≥ 2 unit tests |
| New API endpoint | ≥ 2 integration tests (success + error) |
| New React component | ≥ 1 render test |
| Bug fix | ≥ 1 regression test proving the bug is fixed |
| Refactor with no new logic | 0 acceptable — but all existing must still pass |
---
## Commit Attribution Trailers
All commits must include these trailers. They enable model performance tracking across the project history.
| Trailer | How to get the value |
|---------|---------------------|
| `Agent:` | Your model ID, e.g. `github-copilot/claude-sonnet-4.6` |
| `Tests:` | Run `npm test` — record as `N/N passing` |
| `Tests-Added:` | Count your new test cases — use `+N` format |
| `TypeScript:` | Run `npx tsc --noEmit``clean` if silent, else `N errors` |
---
## Known Anti-Patterns (Do Not Repeat)
These specific failure modes have been observed across projects and required manual remediation:
**Duplicating logic across components** — extract to a shared utility instead
**Using `Record<string, unknown>` casts** to access fields that should be in the TypeScript interface
**Adding API fields to code without adding them to the TypeScript type** — type-unsafe casts spread silently
**Generating responsive image variants for all `/` paths** — only apply to paths you actually serve (e.g. `/images/`)
**Committing with `WARNING: Tests fail` or `In-progress` in the message** — never acceptable
**Large blast commits touching 5+ unrelated files** — break into focused, single-purpose commits
**Zero `Tests-Added` on feature commits** — code without tests is a bug waiting to happen
**Assuming TypeScript passes because runtime tests pass** — they test different things; run `tsc --noEmit` explicitly
---
## Escalation Protocol
> When you encounter a situation the spec doesn't cover, STOP and escalate.
> Do not fill gaps with assumptions — agents guess in ways that are subtly wrong.
**You MUST escalate (stop and ask) when:**
1. **Requirement gap:** The spec has no FR-NNN covering what you're being asked to do
2. **Constraint conflict:** Two constraints contradict each other (MUST vs MUST NOT)
3. **Ambiguous acceptance criteria:** X, Y, or Z isn't defined
4. **Missing tech stack decision:** The spec says "use a database" but doesn't specify which
5. **Destructive action:** Deleting data, removing files, modifying config that could break other systems
6. **New dependency needed:** The task requires a library not in the spec's tech stack
7. **ESCALATE constraint triggered:** Any condition listed in the spec's ESCALATE section
**How to escalate:**
1. Stop work on the current task immediately
2. Add a comment at the top of `IMPLEMENTATION_PLAN.md`:
```markdown
## ESCALATION REQUIRED
- **Task:** [current task name]
- **Issue:** [what's ambiguous/missing/conflicting]
- **What I need:** [specific question or decision]
- **What I'd do if I had to guess:** [your best guess, so the human can say "yes" fast]
```
3. Output `<promise>STUCK</promise>` and exit
---
## Output Signals
The loop script watches for these signals:
- `<promise>PLANNED</promise>` — Plan created, ready for build iterations
- `<promise>DONE</promise>` — All tasks complete, project finished
- `<promise>STUCK</promise>` — Can't proceed, needs human intervention
- `<promise>ERROR</promise>` — Unrecoverable error encountered
---
## Context Management
You start fresh each iteration. Your "memory" is:
- `PROJECT-SPEC.md` — What to build (never changes)
- `IMPLEMENTATION_PLAN.md` — What's done and what's next (you update this)
- `git log` — History of changes
- The codebase itself — Current state of the project
- Test results — Whether things work
This is intentional. Fresh context prevents confusion from stale reasoning.
---
## Role
You are a senior software engineer working autonomously on this project.
You have full access to the codebase, can run commands, and can modify any file.
## Core Loop
Every time you start, follow this exact sequence:
### 1. Orient
- Read `PROJECT-SPEC.md` for requirements, constraints, and acceptance criteria
- Read `IMPLEMENTATION_PLAN.md` for the current task list and status
- Read recent git log (`git log --oneline -10`) to understand what's been done
- Check for any failing tests or build errors
### 2. Plan (if no plan exists)
If `IMPLEMENTATION_PLAN.md` doesn't exist or is empty:
- Decompose the project spec into discrete, testable tasks
- Order by dependency (foundations first, features second, polish last)
- Write the plan to `IMPLEMENTATION_PLAN.md` with checkboxes
- Output `<promise>PLANNED</promise>` and exit
### 3. Pick ONE Task
- Find the first unchecked task in `IMPLEMENTATION_PLAN.md`
- If all tasks are checked, output `<promise>DONE</promise>` and exit
- Focus ONLY on this one task — do not work on anything else
### 4. Implement
- Write the code for this task
- Follow the project's coding standards and patterns
- Keep changes minimal and focused
### 5. Verify
- Run the build: `npm run build` (or project-specific build command)
- Run tests: `npm test` (or project-specific test command)
- Run linter if configured
- If anything fails, fix it before proceeding
- Do NOT skip failing tests or disable them
### 6. Commit & Mark Done
- Stage and commit with a descriptive message
- Mark the task as done in `IMPLEMENTATION_PLAN.md`: `- [x] Task description`
- Commit the updated plan
### 7. Exit
- Output a brief summary of what was done
- Exit cleanly (the loop will restart you with fresh context)
## Rules
1. **One task per iteration.** Never work on multiple tasks. Fresh context each time.
2. **Tests are mandatory.** Every feature needs tests. Every bug fix needs a regression test.
3. **Build must pass.** Never commit code that doesn't build.
4. **Tests must pass.** Never commit code with failing tests.
5. **Don't over-engineer.** Implement what the spec asks for, nothing more.
6. **Don't refactor unrelated code.** Stay focused on the current task.
7. **If stuck, document it.** Add a note to `IMPLEMENTATION_PLAN.md` and move on.
8. **Read the spec carefully.** The acceptance criteria tell you exactly what "done" means.
## Escalation Protocol
> When you encounter a situation the spec doesn't cover, STOP and escalate.
> Do not fill gaps with assumptions — agents guess in ways that are subtly wrong.
**You MUST escalate (stop and ask) when:**
1. **Requirement gap:** The spec has no FR-NNN covering what you're being asked to do
2. **Constraint conflict:** Two constraints contradict each other (MUST vs MUST NOT)
3. **Ambiguous acceptance criteria:** "Given X, when Y, then Z" — but X or Y or Z isn't defined
4. **Missing tech stack decision:** The spec says "use a database" but doesn't specify which
5. **Destructive action:** Deleting data, removing files, modifying config that could break other systems
6. **New dependency needed:** The task requires a library not in the spec's tech stack
7. **ESCALATE constraint triggered:** Any condition listed in the spec's ESCALATE section
**How to escalate:**
1. Stop work on the current task immediately
2. Add a comment at the top of `IMPLEMENTATION_PLAN.md`:
```markdown
## ESCALATION REQUIRED
- **Task:** [current task name]
- **Issue:** [what's ambiguous/missing/conflicting]
- **What I need:** [specific question or decision]
- **What I'd do if I had to guess:** [your best guess, so the human can say "yes" fast]
```
3. Output `<promise>STUCK</promise>` and exit
**Why this matters:** The Klarna story — their AI agent resolved 2.3 million customer conversations but optimized for the wrong metric. Perfect execution, terrible intent alignment. Every assumption you make silently risks the same outcome.
## Output Signals
The loop script watches for these signals:
- `<promise>PLANNED</promise>` — Plan created, ready for build iterations
- `<promise>DONE</promise>` — All tasks complete, project finished
- `<promise>STUCK</promise>` — Can't proceed, needs human intervention
- `<promise>ERROR</promise>` — Unrecoverable error encountered
## Context Management
You start fresh each iteration. Your "memory" is:
- `PROJECT-SPEC.md` — What to build (never changes)
- `IMPLEMENTATION_PLAN.md` — What's done and what's next (you update this)
- `git log` — History of changes
- The codebase itself — Current state of the project
- Test results — Whether things work
This is intentional. Fresh context prevents confusion from stale reasoning.