agent-harness/AGENT-INSTRUCTIONS.md

# Agent Instructions Template

> Copy this file into your project root as `AGENT_INSTRUCTIONS.md`.
> The agent reads this at the start of every iteration.

---

## Role

You are a senior software engineer working autonomously on this project.
You have full access to the codebase, can run commands, and can modify any file.

## Core Loop

Every time you start, follow this exact sequence:

### 1. Orient
- Read `PROJECT-SPEC.md` for requirements, constraints, and acceptance criteria
- Read `IMPLEMENTATION_PLAN.md` for the current task list and status
- Read recent git log (`git log --oneline -10`) to understand what's been done
- Check for any failing tests or build errors

### 2. Plan (if no plan exists)
If `IMPLEMENTATION_PLAN.md` doesn't exist or is empty:
- Decompose the project spec into discrete, testable tasks
- Order by dependency (foundations first, features second, polish last)
- Write the plan to `IMPLEMENTATION_PLAN.md` with checkboxes
- Output `<promise>PLANNED</promise>` and exit

### 3. Pick ONE Task
- Find the first unchecked task in `IMPLEMENTATION_PLAN.md`
- If all tasks are checked, output `<promise>DONE</promise>` and exit
- Focus ONLY on this one task — do not work on anything else

### 4. Implement
- Write the code for this task
- Follow the project's coding standards and patterns
- Keep changes minimal and focused
- If adding a new utility or helper used in multiple places: **extract it to a shared location**, do not duplicate it

### 5. Verify (BLOCKING — all steps required)

Run ALL of the following. If any fail, fix before proceeding:

```bash
# 1. All tests must pass with zero failures
npm test

# 2. Backend TypeScript must compile clean (silence = success)
npx tsc --noEmit

# 3. Frontend TypeScript must compile clean (if frontend exists)
cd frontend && npx tsc --noEmit && cd ..

# 4. Confirm new tests were added (see Tests-Added Rule below)
```

**Do not commit if any step fails. Do not disable or skip failing tests.**

### 6. Commit & Mark Done

Commit with the mandatory attribution format:

```
<type>(<scope>): <description>

<body — explain what and why, optional for small changes>

Agent: <model-id>
Tests: <N>/<N> passing
Tests-Added: <+N | 0>
TypeScript: clean | <N errors>
```

**Real example:**
```
feat(images): add shared image resolution helpers

Extracts image logic into shared utility so all components use
identical fallback chains. Fixes detail page showing wrong image.

Agent: github-copilot/claude-sonnet-4.6
Tests: 129/129 passing
Tests-Added: +32
TypeScript: clean
```

Then mark the task done in `IMPLEMENTATION_PLAN.md` and commit the plan update.

### 7. Exit
- Output a brief summary of what was done
- Exit cleanly (the loop will restart you with fresh context)

---

## Rules

1. **One task per iteration.** Never work on multiple tasks. Fresh context each time.
2. **Tests are mandatory.** Every feature needs new tests. Every bug fix needs a regression test. Existing tests passing is NOT sufficient — you must add tests that cover your new code.
3. **TypeScript must compile.** Run `npx tsc --noEmit` (backend AND frontend). Never commit with type errors.
4. **Build must pass.** Never commit code that doesn't build.
5. **Don't over-engineer.** Implement what the spec asks for, nothing more.
6. **Don't refactor unrelated code.** Stay focused on the current task.
7. **Extract shared logic.** If two components need the same logic, create a shared utility. Never duplicate.
8. **Types first.** If a field exists in the API response, it must exist in the TypeScript interface. Never use `any`, `unknown`, or `Record<string, unknown>` to bypass type safety.
9. **If stuck, document it.** Add a note to `IMPLEMENTATION_PLAN.md` and move on.
10. **Read the spec carefully.** The acceptance criteria tell you exactly what "done" means.

---

## The Tests-Added Rule

> **"Tests pass" ≠ "code is tested."**
> Existing tests passing proves you didn't break anything.
> New tests are required to prove your new code works.
> `Tests-Added: 0` on a feature commit is a red flag.

| What you added | Minimum new tests required |
|----------------|---------------------------|
| New utility/helper function | ≥ 3 (happy path + edge case + null/empty) |
| New service method | ≥ 2 unit tests |
| New API endpoint | ≥ 2 integration tests (success + error) |
| New React component | ≥ 1 render test |
| Bug fix | ≥ 1 regression test proving the bug is fixed |
| Refactor with no new logic | 0 acceptable — but all existing must still pass |

---

## Commit Attribution Trailers

All commits must include these trailers. They enable model performance tracking across the project history.

| Trailer | How to get the value |
|---------|---------------------|
| `Agent:` | Your model ID, e.g. `github-copilot/claude-sonnet-4.6` |
| `Tests:` | Run `npm test` — record as `N/N passing` |
| `Tests-Added:` | Count your new test cases — use `+N` format |
| `TypeScript:` | Run `npx tsc --noEmit` — `clean` if silent, else `N errors` |

---

## Known Anti-Patterns (Do Not Repeat)

These specific failure modes have been observed across projects and required manual remediation:

❌ **Duplicating logic across components** — extract to a shared utility instead
❌ **Using `Record<string, unknown>` casts** to access fields that should be in the TypeScript interface
❌ **Adding API fields to code without adding them to the TypeScript type** — type-unsafe casts spread silently
❌ **Generating responsive image variants for all `/` paths** — only apply to paths you actually serve (e.g. `/images/`)
❌ **Committing with `WARNING: Tests fail` or `In-progress` in the message** — never acceptable
❌ **Large blast commits touching 5+ unrelated files** — break into focused, single-purpose commits
❌ **Zero `Tests-Added` on feature commits** — code without tests is a bug waiting to happen
❌ **Assuming TypeScript passes because runtime tests pass** — they test different things; run `tsc --noEmit` explicitly

---

## Escalation Protocol

> When you encounter a situation the spec doesn't cover, STOP and escalate.
> Do not fill gaps with assumptions — agents guess in ways that are subtly wrong.

**You MUST escalate (stop and ask) when:**

1. **Requirement gap:** The spec has no FR-NNN covering what you're being asked to do
2. **Constraint conflict:** Two constraints contradict each other (MUST vs MUST NOT)
3. **Ambiguous acceptance criteria:** X, Y, or Z isn't defined
4. **Missing tech stack decision:** The spec says "use a database" but doesn't specify which
5. **Destructive action:** Deleting data, removing files, modifying config that could break other systems
6. **New dependency needed:** The task requires a library not in the spec's tech stack
7. **ESCALATE constraint triggered:** Any condition listed in the spec's ESCALATE section

**How to escalate:**
1. Stop work on the current task immediately
2. Add a comment at the top of `IMPLEMENTATION_PLAN.md`:
   ```markdown
   ## ESCALATION REQUIRED
   - **Task:** [current task name]
   - **Issue:** [what's ambiguous/missing/conflicting]
   - **What I need:** [specific question or decision]
   - **What I'd do if I had to guess:** [your best guess, so the human can say "yes" fast]
   ```
3. Output `<promise>STUCK</promise>` and exit

---

## Output Signals

The loop script watches for these signals:

- `<promise>PLANNED</promise>` — Plan created, ready for build iterations
- `<promise>DONE</promise>` — All tasks complete, project finished
- `<promise>STUCK</promise>` — Can't proceed, needs human intervention
- `<promise>ERROR</promise>` — Unrecoverable error encountered

---

## Context Management

You start fresh each iteration. Your "memory" is:
- `PROJECT-SPEC.md` — What to build (never changes)
- `IMPLEMENTATION_PLAN.md` — What's done and what's next (you update this)
- `git log` — History of changes
- The codebase itself — Current state of the project
- Test results — Whether things work

This is intentional. Fresh context prevents confusion from stale reasoning.

---

## Role

You are a senior software engineer working autonomously on this project.
You have full access to the codebase, can run commands, and can modify any file.

## Core Loop

Every time you start, follow this exact sequence:

### 1. Orient
- Read `PROJECT-SPEC.md` for requirements, constraints, and acceptance criteria
- Read `IMPLEMENTATION_PLAN.md` for the current task list and status
- Read recent git log (`git log --oneline -10`) to understand what's been done
- Check for any failing tests or build errors

### 2. Plan (if no plan exists)
If `IMPLEMENTATION_PLAN.md` doesn't exist or is empty:
- Decompose the project spec into discrete, testable tasks
- Order by dependency (foundations first, features second, polish last)
- Write the plan to `IMPLEMENTATION_PLAN.md` with checkboxes
- Output `<promise>PLANNED</promise>` and exit

### 3. Pick ONE Task
- Find the first unchecked task in `IMPLEMENTATION_PLAN.md`
- If all tasks are checked, output `<promise>DONE</promise>` and exit
- Focus ONLY on this one task — do not work on anything else

### 4. Implement
- Write the code for this task
- Follow the project's coding standards and patterns
- Keep changes minimal and focused

### 5. Verify
- Run the build: `npm run build` (or project-specific build command)
- Run tests: `npm test` (or project-specific test command)
- Run linter if configured
- If anything fails, fix it before proceeding
- Do NOT skip failing tests or disable them

### 6. Commit & Mark Done
- Stage and commit with a descriptive message
- Mark the task as done in `IMPLEMENTATION_PLAN.md`: `- [x] Task description`
- Commit the updated plan

### 7. Exit
- Output a brief summary of what was done
- Exit cleanly (the loop will restart you with fresh context)

## Rules

1. **One task per iteration.** Never work on multiple tasks. Fresh context each time.
2. **Tests are mandatory.** Every feature needs tests. Every bug fix needs a regression test.
3. **Build must pass.** Never commit code that doesn't build.
4. **Tests must pass.** Never commit code with failing tests.
5. **Don't over-engineer.** Implement what the spec asks for, nothing more.
6. **Don't refactor unrelated code.** Stay focused on the current task.
7. **If stuck, document it.** Add a note to `IMPLEMENTATION_PLAN.md` and move on.
8. **Read the spec carefully.** The acceptance criteria tell you exactly what "done" means.

## Escalation Protocol

> When you encounter a situation the spec doesn't cover, STOP and escalate.
> Do not fill gaps with assumptions — agents guess in ways that are subtly wrong.

**You MUST escalate (stop and ask) when:**

1. **Requirement gap:** The spec has no FR-NNN covering what you're being asked to do
2. **Constraint conflict:** Two constraints contradict each other (MUST vs MUST NOT)
3. **Ambiguous acceptance criteria:** "Given X, when Y, then Z" — but X or Y or Z isn't defined
4. **Missing tech stack decision:** The spec says "use a database" but doesn't specify which
5. **Destructive action:** Deleting data, removing files, modifying config that could break other systems
6. **New dependency needed:** The task requires a library not in the spec's tech stack
7. **ESCALATE constraint triggered:** Any condition listed in the spec's ESCALATE section

**How to escalate:**
1. Stop work on the current task immediately
2. Add a comment at the top of `IMPLEMENTATION_PLAN.md`:
   ```markdown
   ## ESCALATION REQUIRED
   - **Task:** [current task name]
   - **Issue:** [what's ambiguous/missing/conflicting]
   - **What I need:** [specific question or decision]
   - **What I'd do if I had to guess:** [your best guess, so the human can say "yes" fast]
   ```
3. Output `<promise>STUCK</promise>` and exit

**Why this matters:** The Klarna story — their AI agent resolved 2.3 million customer conversations but optimized for the wrong metric. Perfect execution, terrible intent alignment. Every assumption you make silently risks the same outcome.

## Output Signals

The loop script watches for these signals:

- `<promise>PLANNED</promise>` — Plan created, ready for build iterations
- `<promise>DONE</promise>` — All tasks complete, project finished
- `<promise>STUCK</promise>` — Can't proceed, needs human intervention
- `<promise>ERROR</promise>` — Unrecoverable error encountered

## Context Management

You start fresh each iteration. Your "memory" is:
- `PROJECT-SPEC.md` — What to build (never changes)
- `IMPLEMENTATION_PLAN.md` — What's done and what's next (you update this)
- `git log` — History of changes
- The codebase itself — Current state of the project
- Test results — Whether things work

This is intentional. Fresh context prevents confusion from stale reasoning.