agent-harness/TUTORIAL.md

20 KiB
Raw Blame History

Tutorial: Build a CLI Tool in 30 Minutes with the Harness

This is a complete walkthrough from zero to working software. Example project: A markdown link checker that finds broken links in .md files. Follow along exactly — copy-paste commands, see what works.


What You'll Build

mdlinkcheck — A CLI tool that:

  • Scans markdown files for links
  • Tests each HTTP/HTTPS link (returns 200?)
  • Identifies broken links (404, timeout, etc.)
  • Outputs a report with line numbers

Tech stack: TypeScript, Node.js, Commander.js for CLI parsing

Time investment:

  • Interview & spec: 10 minutes
  • Agent planning: 2 minutes
  • Agent building: 15-20 minutes (4-6 iterations)
  • Your review: 5 minutes

Prerequisites

  • Node.js 18+ installed
  • Claude CLI (npm install -g @anthropic-ai/claude-cli) or OpenClaw access
  • Agent harness templates (this repo)
  • 30 minutes of your time

Phase 0: Setup (2 minutes)

1. Create the project directory

mkdir ~/projects/mdlinkcheck
cd ~/projects/mdlinkcheck
git init

2. Copy templates

cp /path/to/agent-harness/AGENT-INSTRUCTIONS.md ./AGENT.md
cp /path/to/agent-harness/PROJECT-SPEC.md ./PROJECT-SPEC.md
cp /path/to/agent-harness/DECISIONS.md ./DECISIONS.md
cp /path/to/agent-harness/ralph-loop.sh ./ralph-loop.sh
chmod +x ralph-loop.sh

3. Initial commit

git add .
git commit -m "Initial setup: agent harness templates"

You now have an empty project with the harness scaffolding.


Phase 1: The Interview (10 minutes)

Open a chat with your agent (Claude, Cleo, etc.) and run through the interview protocol from SPEC-CREATION-GUIDE.md.

Your opening message:

I want to build a CLI tool using the agent harness. Let's create 
the spec together using the interview protocol.

Project: A markdown link checker that finds broken links.

Expected Interview Flow

Agent: "What are we building, in one sentence?"

You:

A CLI tool that scans markdown files for HTTP/HTTPS links and 
reports which ones are broken (404, timeout, etc.).

Agent: "Who is this for?"

You:

Developers who maintain documentation. They run it before 
committing to catch broken links.

Agent: "Why now? What's the trigger?"

You:

I maintain several docs repos and keep finding broken links after 
they're published. I want to catch them earlier.

Agent: "What does 'done' look like?"

You:

I can run `mdlinkcheck docs/**/*.md` and get a report like:
docs/api.md:15 - BROKEN - https://example.com/missing (404)
docs/guide.md:42 - BROKEN - https://oldsite.com (timeout)

Found 2 broken links out of 47 total.

Exit code 0 if all links work, 1 if any are broken (for CI).

Agent: "What have you tried before? What didn't work?"

You:

Tried markdown-link-check (npm package) but it's slow and has 
too many dependencies. I want something simple and fast.

Agent: "Walk me through how you'd use this feature."

You:

1. Install: npm install -g mdlinkcheck
2. Run: mdlinkcheck README.md
3. See output: list of broken links or "All links OK!"
4. Check exit code: 0 = success, 1 = broken links found

Agent: "What could go wrong?"

You:

- No markdown files found (should show helpful message)
- Network is down (should timeout gracefully, not hang)
- Huge file with 1000+ links (should show progress)
- Link requires auth (should report as "unreachable" not "broken")

Agent: "What's the input? What's the output?"

You:

Input: File paths or glob patterns (e.g., docs/**/*.md)
Output: 
  - For each broken link: filename:lineNumber - STATUS - URL (error)
  - Summary: "Found X broken links out of Y total"
  - Exit code: 0 (all good) or 1 (broken links)

Agent: "How would you test this?"

You:

Create a test.md with known good and bad links:
- https://google.com (should work)
- https://thisurldoesnotexist123456.com (should fail)

Run the tool, verify it reports the broken one.

Agent: "Is this a must-have or nice-to-have?"

You:

Must-haves:
- Parse markdown, extract links
- Test HTTP/HTTPS links
- Report broken ones with filename and line number
- Proper exit codes for CI

Nice-to-haves (defer to v2):
- Check relative links (./other-doc.md)
- Parallel checking for speed
- Retry on timeout
- Config file for ignored URLs

Phase 2: Spec Assembly (Agent does this)

You:

Great! Now fill out PROJECT-SPEC.md based on our interview. 
Use the template in the repo. Be specific about:
- Tech stack: TypeScript, Node.js, Commander.js
- Build command: npm run build (compile TS to dist/)
- Test command: npm test (use Node.js test runner)
- Data model: not needed (stateless tool)
- One requirement per feature with acceptance criteria

Agent generates spec (2-3 minutes):

The agent reads PROJECT-SPEC.md template and fills it out:

# Project Specification: mdlinkcheck

## 1. Project Overview

### What are we building?
A command-line tool that scans markdown files for HTTP/HTTPS links 
and reports broken ones (404, timeout, connection errors).

### Why does it matter?
Developers maintaining documentation need to catch broken links 
before publishing. Existing tools are slow or over-complicated.

### Success criteria
- [ ] Scan .md files and extract all HTTP/HTTPS links
- [ ] Test each link (HEAD request, follow redirects)
- [ ] Report broken links with filename:line number
- [ ] Exit code 0 (all OK) or 1 (broken links found)
- [ ] Works on CLI: `mdlinkcheck docs/**/*.md`

## 2. Technical Foundation

### Tech stack
- Language: TypeScript
- Runtime: Node.js 18+
- CLI framework: Commander.js
- HTTP client: Native fetch (Node 18+)
- Build: tsc (TypeScript compiler)
- Test: Node.js test runner (node:test)

### Project structure
mdlinkcheck/
├── src/
│   ├── cli.ts          # Commander.js entry point
│   ├── parser.ts       # Extract links from markdown
│   ├── checker.ts      # Test links via HTTP
│   └── reporter.ts     # Format output
├── tests/
│   ├── parser.test.ts
│   ├── checker.test.ts
│   └── fixtures/       # Test markdown files
├── dist/               # Compiled JS (gitignored)
├── package.json
├── tsconfig.json
└── PROJECT-SPEC.md

### Build & test commands
npm install
npm run build    # tsc
npm test         # node --test tests/**/*.test.ts
npm run lint     # tsc --noEmit (type check only)

## 3. Requirements

### FR-001: Extract Links from Markdown
**Description:** Parse markdown files and extract all HTTP/HTTPS URLs.
**Acceptance criteria:**
- [ ] Find links in standard format: [text](https://url)
- [ ] Find links in reference format: [text][ref] ... [ref]: https://url
- [ ] Find bare URLs: https://example.com
- [ ] Return array of { url, line } objects
- [ ] Skip relative links (./file.md, /path/file.md)

### FR-002: Test HTTP/HTTPS Links
**Description:** Send HTTP request to each URL and determine status.
**Acceptance criteria:**
- [ ] Use HEAD request (faster than GET)
- [ ] Follow redirects (up to 3)
- [ ] Timeout after 10 seconds
- [ ] Return status: OK | BROKEN | TIMEOUT | UNREACHABLE
- [ ] Handle network errors gracefully (don't crash)

### FR-003: CLI Interface
**Description:** Command-line tool that accepts file paths or globs.
**Acceptance criteria:**
- [ ] Command: `mdlinkcheck <files...>`
- [ ] Expand glob patterns (docs/**/*.md)
- [ ] Show usage help with --help
- [ ] Show version with --version
- [ ] Exit code 0 if all links OK, 1 if any broken

### FR-004: Reporting
**Description:** Output broken links in readable format.
**Acceptance criteria:**
- [ ] Format: `filename:line - STATUS - URL (error message)`
- [ ] Summary: "Found X broken links out of Y total"
- [ ] Show "All links OK!" if none broken
- [ ] Color output: red for broken, green for OK (optional)

## 4. Anti-patterns
- Don't use regex for markdown parsing — use a markdown parser
- Don't use GET requests — HEAD is faster and sufficient
- Don't hang on timeout — set explicit timeout
- Don't crash on network errors — catch and report

## 5. Phasing

### Phase 1: Core (must-have)
- [ ] Project setup (package.json, tsconfig, build)
- [ ] Link extraction from markdown
- [ ] HTTP link checking
- [ ] CLI interface
- [ ] Basic reporting

### Phase 2: Polish (nice-to-have, defer)
- [ ] Parallel checking
- [ ] Progress bar
- [ ] Config file for ignored URLs
- [ ] Check relative links

You review the spec:

Looks good! One clarification: use the `marked` library for parsing
markdown (don't write a regex-based parser). Update FR-001.

Agent updates spec, commits:

git add PROJECT-SPEC.md
git commit -m "Add project specification for mdlinkcheck"

Phase 3: Planning (Agent does this)

You:

Read PROJECT-SPEC.md. Create IMPLEMENTATION_PLAN.md by decomposing 
the project into discrete tasks ordered by dependency. Output 
<promise>PLANNED</promise> when done.

Agent thinks and writes plan:

# Implementation Plan: mdlinkcheck

## Phase 1: Foundation
- [ ] Project scaffolding (package.json, tsconfig, src/, tests/)
- [ ] CLI entry point with Commander.js (--help, --version)
- [ ] Markdown link extraction (using marked lexer)
- [ ] HTTP link checker (fetch with timeout)
- [ ] Reporter (format output)
- [ ] Wire together: CLI → parser → checker → reporter
- [ ] Integration test with fixtures

## Phase 2: Polish (Deferred)
- [ ] Parallel checking
- [ ] Progress indicator

Agent outputs:

<promise>PLANNED</promise>

Plan created with 7 tasks. Review IMPLEMENTATION_PLAN.md.

You review:

Good! Let's start building.

Phase 4: Build Iterations (Agent does this, you watch)

Method A: Using ralph-loop.sh

./ralph-loop.sh --agent claude --max 10

Method B: Using OpenClaw sessions_spawn

You: "Run the first agent harness iteration for mdlinkcheck"
Cleo: *spawns sub-agent* → reads AGENT.md → picks first task → builds

Method C: Manual (for learning)

You: "Read AGENT.md in /path/to/mdlinkcheck. Follow the core loop:
      orient, pick the first unchecked task in IMPLEMENTATION_PLAN.md,
      implement it, verify (npm run build && npm test), commit, exit."

Iteration 1: Project scaffolding

Agent output:

Reading PROJECT-SPEC.md and IMPLEMENTATION_PLAN.md...
First task: Project scaffolding

Creating package.json...
Creating tsconfig.json...
Creating src/, tests/, tests/fixtures/...
Installing dependencies: commander, marked, @types/node...
Running build: ✅ compiles successfully
Running tests: ✅ (no tests yet, but runner works)

Committing...
[main a1b2c3d] chore: project scaffolding with TypeScript and Commander
Task complete. Exiting for fresh context.

Git log:

a1b2c3d chore: project scaffolding with TypeScript and Commander

IMPLEMENTATION_PLAN.md updated:

- [x] Project scaffolding (package.json, tsconfig, src/, tests/)
- [ ] CLI entry point with Commander.js (--help, --version)
...

Iteration 2: CLI entry point

Agent output:

Reading IMPLEMENTATION_PLAN.md...
Next task: CLI entry point with Commander.js

Creating src/cli.ts...
Wiring up Commander: program.name('mdlinkcheck').description('...')
Adding --help and --version...
Adding bin entry to package.json...
Testing: npm link (local install)
Running: mdlinkcheck --help ✅
Running: mdlinkcheck --version ✅

Committing...
[main b2c3d4e] feat: CLI entry point with Commander.js

You can now run:

npm link
mdlinkcheck --help
# mdlinkcheck - Find broken links in markdown files
# Usage: mdlinkcheck <files...>

Agent output:

Next task: Markdown link extraction

Creating src/parser.ts...
Using marked.lexer to parse markdown into tokens...
Extracting links from link and image tokens...
Tracking line numbers...
Writing tests with fixtures/test.md...

Test results:
✅ Extracts standard markdown links
✅ Extracts reference-style links
✅ Skips relative links
✅ Returns line numbers

Committing...
[main c3d4e5f] feat: markdown link extraction with line numbers

Files created:

  • src/parser.ts
  • tests/parser.test.ts
  • tests/fixtures/test.md (sample markdown with various link formats)

Agent output:

Next task: HTTP link checker

Creating src/checker.ts...
Using fetch with HEAD method...
Following redirects (up to 3)...
10-second timeout via AbortController...
Error handling for network failures...

Test results:
✅ Returns OK for 200 status
✅ Returns BROKEN for 404
✅ Returns TIMEOUT for hung requests
✅ Returns UNREACHABLE for network errors

Committing...
[main d4e5f6g] feat: HTTP link checker with timeout and error handling

Iteration 5: Reporter

Agent output:

Next task: Reporter

Creating src/reporter.ts...
Format: filename:line - STATUS - URL (error)...
Summary line with counts...
Color output using ANSI codes (red for broken, green for OK)...

Test results:
✅ Formats broken link correctly
✅ Shows summary
✅ Handles no broken links case

Committing...
[main e5f6g7h] feat: reporter with colored output and summary

Iteration 6: Wire together

Agent output:

Next task: Wire together CLI → parser → checker → reporter

Updating src/cli.ts...
For each file argument:
  1. Read file
  2. Extract links (parser)
  3. Check each link (checker)
  4. Collect broken links
Report results (reporter)
Set exit code (0 or 1)

Build: ✅
Test: ✅

Manual test with real URLs:
Created tests/fixtures/real-links.md:
- https://google.com
- https://thisurldoesnotexist123456.com

Running: mdlinkcheck tests/fixtures/real-links.md
Output:
  tests/fixtures/real-links.md:2 - BROKEN - https://thisurldoesnotexist123456.com (404)
  Found 1 broken link out of 2 total.

Exit code: 1 ✅

Committing...
[main f6g7h8i] feat: wire CLI components together, end-to-end working

Iteration 7: Integration test

Agent output:

Next task: Integration test with fixtures

Creating tests/integration.test.ts...
Test fixture with known good and bad links...
Run CLI programmatically...
Assert output contains expected broken links...
Assert exit code is 1...

Test results:
✅ Detects broken links in fixture
✅ Reports correct filename and line number
✅ Exit code is 1 when links are broken
✅ Exit code is 0 when all links are OK

All tasks complete!
<promise>DONE</promise>

Committing...
[main g7h8i9j] test: add integration tests with fixtures

Phase 5: Review (You do this)

1. Check the git log

git log --oneline

Output:

g7h8i9j test: add integration tests with fixtures
f6g7h8i feat: wire CLI components together, end-to-end working
e5f6g7h feat: reporter with colored output and summary
d4e5f6g feat: HTTP link checker with timeout and error handling
c3d4e5f feat: markdown link extraction with line numbers
b2c3d4e feat: CLI entry point with Commander.js
a1b2c3d chore: project scaffolding with TypeScript and Commander

7 iterations, 7 commits — clean history

2. Run the build and tests

npm run build
npm test

Output:

> build
tsc

> test
✅ parser extracts links correctly (4 tests)
✅ checker tests URLs correctly (4 tests)
✅ reporter formats output (3 tests)
✅ integration end-to-end (2 tests)

13 tests passed

All tests pass

3. Try it for real

npm link
mdlinkcheck README.md

Output:

All links OK! ✅
Found 0 broken links out of 5 total.

Try with a broken link:

echo "[broken](https://thisurldoesnotexist123456.com)" > test.md
mdlinkcheck test.md

Output:

test.md:1 - BROKEN - https://thisurldoesnotexist123456.com (404)
Found 1 broken link out of 1 total.

Works as specified

4. Review the code quality

Check src/parser.ts:

import { marked } from 'marked';

interface LinkInfo {
  url: string;
  line: number;
}

export function extractLinks(markdown: string): LinkInfo[] {
  const tokens = marked.lexer(markdown);
  const links: LinkInfo[] = [];
  
  function walkTokens(token: any, lineOffset = 0) {
    if (token.type === 'link' || token.type === 'image') {
      if (token.href.startsWith('http://') || token.href.startsWith('https://')) {
        links.push({
          url: token.href,
          line: lineOffset + (token.line || 1)
        });
      }
    }
    if (token.tokens) {
      token.tokens.forEach((t: any) => walkTokens(t, lineOffset));
    }
  }
  
  tokens.forEach(token => walkTokens(token));
  return links;
}

Clean, readable, uses marked instead of regex

Check src/checker.ts:

export async function checkLink(url: string): Promise<LinkStatus> {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 10000);
  
  try {
    const response = await fetch(url, {
      method: 'HEAD',
      signal: controller.signal,
      redirect: 'follow'
    });
    
    clearTimeout(timeout);
    return response.ok ? 'OK' : 'BROKEN';
  } catch (error) {
    clearTimeout(timeout);
    if (error.name === 'AbortError') return 'TIMEOUT';
    return 'UNREACHABLE';
  }
}

Proper timeout handling, follows redirects, good error handling

5. Review against spec

Requirement Status
FR-001: Extract links from markdown Implemented with marked
FR-002: Test HTTP/HTTPS links HEAD requests with timeout
FR-003: CLI interface Commander.js with --help
FR-004: Reporting Colored output with summary
Exit codes 0 for success, 1 for broken
Tests 13 tests, all passing

Phase 6: Polish (Optional)

You:

This looks great! Let's add one enhancement: show a progress 
indicator when checking many links. Add this as a task to the plan.

You update IMPLEMENTATION_PLAN.md:

## Phase 2: Polish
- [ ] Progress indicator (show "Checking... 5/47" while running)

You:

Run one more iteration to implement the progress indicator.

Agent (iteration 8):

Reading plan...
Next task: Progress indicator

Adding progress output to src/cli.ts...
Show "Checking link 5/47..." as each link is tested...
Clear line and show summary when done...

Build: ✅
Test: ✅
Manual test: Shows progress ✅

Committing...
[main h8i9j0k] feat: add progress indicator for link checking
<promise>DONE</promise>

Now when you run it:

mdlinkcheck docs/**/*.md

Output:

Checking link 1/47...
Checking link 2/47...
...
Checking link 47/47...

All links OK! ✅
Found 0 broken links out of 47 total.

What You Just Did

In 30 minutes, you:

  1. Interviewed with an agent to create a clear spec
  2. Let the agent plan the task decomposition
  3. Ran 7 autonomous iterations (each: implement → test → commit)
  4. Reviewed the output (5 minutes)
  5. Added a polish enhancement (1 more iteration)
  6. Shipped a working CLI tool with tests

Your effort:

  • Interview: 10 min (you answered questions)
  • Review: 5 min (you ran the code and checked quality)
  • Total: 15 minutes of YOUR time

Agent's effort:

  • Planning: 2 min
  • Building: 18 min (8 iterations × ~2 min each)
  • Total: 20 minutes of autonomous work

You wrote zero lines of code. You defined WHAT to build, and the agent figured out HOW.


What to Try Next

Package it for npm

- [ ] Add README.md with usage examples
- [ ] Add LICENSE (MIT)
- [ ] Publish to npm: `npm publish`

Add features

- [ ] Check relative links (./other-doc.md)
- [ ] Config file to ignore certain URLs
- [ ] Parallel checking (Promise.all for speed)
- [ ] JSON output mode for CI integration

Use it in CI

# .github/workflows/docs.yml
name: Check Links
on: [push]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: npm install -g mdlinkcheck
      - run: mdlinkcheck docs/**/*.md

Key Lessons from This Tutorial

1. The spec is everything

The 10-minute interview created a spec that guided 8 flawless iterations. Bad spec = bad output.

2. Fresh context prevents drift

Each iteration started fresh. No context overflow, no confusion from stale reasoning.

3. Tests validate autonomy

Every iteration ran npm test. Failing tests forced the agent to fix before proceeding. No test theater.

4. Git history tells the story

One commit per task. Clean, reviewable, revertable.

5. You shift from writer to reviewer

Your job: define the goal, review the output, course-correct. The agent writes the code.


Now go build something real. Interview, spec, run the loop, review. Repeat.