agent-harness/COST-OPTIMIZATION.md

# Cost Optimization — Getting More Work Per Dollar

> AI model subscriptions have fundamentally different billing models.
> Understanding them is the difference between $5 and $50 for the same output.
> This guide teaches you to match your work patterns to your billing model.

---

## The Two Billing Models

### Request-Based (GitHub Copilot Pro)

- **What counts:** Each API call = 1 request
- **Multipliers:** Advanced models cost more per request (e.g., Opus 4.6 = 3x)
- **Key insight:** A 2-second request costs the same as a 10-minute request
- **Budget:** Fixed monthly request pool (e.g., 300 premium requests/month)

**What's expensive:** Many small requests
**What's cheap:** Few large requests that do lots of work

### Token-Based (Anthropic Claude Pro)

- **What counts:** Input tokens + output tokens consumed
- **Windows:** Per-session (5-hour) and weekly token budgets
- **Key insight:** Context grows with every turn — turn 50 includes ALL previous turns as input
- **Danger zone:** Long conversations burn tokens exponentially

**What's expensive:** Large context windows, long conversations
**What's cheap:** Fresh sessions with minimal context

---

## How Context Affects Cost

### The Context Growth Problem (Token-Based)

Every message in a conversation gets re-sent as context:

```
Turn 1:  Input = system prompt (2K tokens)           → Total: 2K
Turn 5:  Input = system prompt + 4 prior turns (8K)  → Total: 8K
Turn 10: Input = system prompt + 9 prior turns (20K) → Total: 20K
Turn 20: Input = system prompt + 19 prior turns (50K) → Total: 50K
Turn 30: Input = system prompt + 29 prior turns (90K) → Total: 90K
```

By turn 30, you're burning 90K input tokens PER TURN just for context. The actual
new content might be 500 tokens, but you're paying for the full history every time.

**This is why sub-agents are token-efficient on Anthropic:**
Each sub-agent starts with ~2K tokens of context (just the task prompt), regardless
of how long your main conversation has been running.

### Context Doesn't Matter (Request-Based)

On Copilot, a request with 2K context costs the same as a request with 100K context.
It's still 1 request. So for request-based billing:
- Let context accumulate — it's free
- Pack more work into each request
- Don't spawn sub-agents unnecessarily (each spawn = new request)

---

## Optimal Strategies by Subscription

### GitHub Copilot Pro — Batch Everything

**Goal:** Maximize work per request

**Pattern: Fat Sub-Agents**
```
# BAD: 5 requests × 3 multiplier = 15 premium requests
sessions_spawn("Do task 1") → 1 request
sessions_spawn("Do task 2") → 1 request
sessions_spawn("Do task 3") → 1 request
sessions_spawn("Do task 4") → 1 request
sessions_spawn("Do task 5") → 1 request

# GOOD: 1 request × 3 multiplier = 3 premium requests
sessions_spawn("Do tasks 1-5 sequentially. For each:
  implement, test, commit, then move to the next.")
```

**Same work, 80% cheaper.**

**Pattern: Compound Tasks**
```
# BAD: 3 separate requests
"Review the code"          → 1 request
"Fix the issues you found" → 1 request
"Update the docs"          → 1 request

# GOOD: 1 compound request
"Review the code, fix any issues you find, and update the
docs to reflect the changes. Commit each fix separately."
```

**Pattern: Agent Harness with Multi-Task Iterations**
```
# Modified AGENT-INSTRUCTIONS.md for Copilot:
### 3. Pick Tasks
- Find the NEXT 3-5 unchecked tasks in IMPLEMENTATION_PLAN.md
- Complete ALL of them in this iteration
- Commit after each task (for clean git history)
- Then exit for a fresh context restart
```

**When to use Copilot models:**
- Long autonomous coding sessions (lots of tool calls = still 1 request)
- Complex multi-step tasks (planning + implementation + testing)
- Agent harness iterations (pack 3-5 tasks per iteration)
- Overnight batch work

**When NOT to use Copilot models:**
- Quick questions ("What time is it in Tokyo?")
- Simple file reads or lookups
- Anything you could do with a cheaper/free model

### Anthropic Claude Pro — Stay Lean

**Goal:** Minimize token consumption per interaction

**Pattern: Fresh Sub-Agents**
```
# GOOD on Anthropic: Each sub-agent starts with clean context
sessions_spawn("Do task 1")  → ~5K tokens (fresh context)
sessions_spawn("Do task 2")  → ~5K tokens (fresh context)
sessions_spawn("Do task 3")  → ~5K tokens (fresh context)
Total: ~15K tokens

# BAD on Anthropic: One long conversation
Main session turn 1: "Do task 1"  → 3K input tokens
Main session turn 5: "Do task 2"  → 15K input tokens
Main session turn 10: "Do task 3" → 35K input tokens
Total: ~53K tokens (3.5x more!)
```

**Pattern: Minimal Context Agent Instructions**
```
# BAD: Agent reads entire spec every iteration
"Read PROJECT-SPEC.md (5000 tokens), IMPLEMENTATION_PLAN.md (2000 tokens),
 DECISIONS.md (1500 tokens), and the last 20 git commits..."

# GOOD: Agent reads only what it needs
"Read IMPLEMENTATION_PLAN.md. Find the first unchecked task.
 Read ONLY the relevant section of PROJECT-SPEC.md for that task.
 Implement, test, commit."
```

**Pattern: Offload to Cheaper Models**
```
# Use Sonnet (cheaper) for routine work
sessions_spawn("Implement the CRUD endpoints", model: "sonnet")

# Use Opus (expensive) only for complex reasoning
sessions_spawn("Design the Monte Carlo simulation algorithm", model: "opus")
```

**When to use Anthropic models:**
- Quick interactions in main session (small context = few tokens)
- Tasks requiring strong reasoning (Opus quality)
- Sub-agent swarms (fresh context each time)

**When NOT to use Anthropic models:**
- Long main-session conversations (context grows = token burn)
- Low-complexity tasks (use a cheaper model)
- Repetitive iterations (context grows even with similar content)

---

## Model Selection Guide

### By Task Complexity

| Task | Recommended | Why |
|------|-------------|-----|
| Planning & decomposition | Opus (either provider) | Needs strong reasoning |
| Scaffolding & config | Sonnet or GPT-4.1 | Simple, deterministic |
| Feature implementation | Sonnet | Good balance |
| Complex algorithms | Opus | Deep reasoning needed |
| Bug diagnosis | Opus | Pattern recognition |
| Bug fixing | Sonnet | Usually straightforward once diagnosed |
| Documentation | Sonnet or GPT-4.1 | Writing, not complex reasoning |
| Code review | Opus | Needs to spot subtle issues |
| Test writing | Sonnet | Follows patterns from spec |

### By Provider Optimization

| Scenario | Best Provider | Reasoning |
|----------|--------------|-----------|
| 5 tasks in agent harness | Copilot (batch 5 tasks = 1 request) | Request efficiency |
| Quick "what's the status?" | Anthropic (small context) | Token efficiency |
| Overnight autonomous loop | Copilot (fewer requests total) | Request efficiency |
| Sub-agent swarm (10 agents) | Anthropic (fresh context each) | Token efficiency |
| Long planning conversation | Copilot (context growth is free) | Request efficiency |
| One-shot code generation | Either (1 request, small context) | Similar cost |

---

## The Hybrid Strategy

Use both subscriptions strategically:

```
Morning check-in (main session):     Anthropic Sonnet (small context, quick)
Planning conversation:               Copilot Opus (context growth is free)
Agent harness iterations:            Copilot Sonnet (batch tasks, 1 request each)
Complex debugging:                   Copilot Opus (1 request, deep reasoning)
Quick questions throughout the day:  Anthropic Sonnet (minimal tokens)
Overnight autonomous work:           Copilot Sonnet (batch tasks, few requests)
```

### Budget Allocation Example

Monthly budget:
- Copilot Pro: 300 premium requests (Opus = 3x, Sonnet = 1x)
- Anthropic Pro: Weekly token budget (resets Sundays)

**Agent harness project (20 iterations):**
```
Copilot approach:
  Planning:     1 Opus request = 3 premium     → 3
  20 iterations × batch 5 tasks = 4 Sonnet     → 4
  Code review:  2 Opus requests = 6 premium     → 6
  Total: 13 premium requests

Anthropic approach:
  Planning:     1 session = ~10K tokens
  20 iterations × 1 task = 20 sessions × ~8K   → 160K tokens
  Code review:  2 sessions × ~15K              → 30K tokens
  Total: ~200K tokens (could eat a chunk of weekly budget)
```

For an agent harness project, **Copilot is usually cheaper** because you can batch.

For daily conversational use, **Anthropic is usually cheaper** because most interactions are short.

---

## Anti-Patterns to Avoid

### 1. The Chatty Agent (Anthropic killer)
```
Turn 1:  "What should I work on?"           ← Wastes a turn
Turn 2:  "I'll start with the parser"       ← Wastes a turn
Turn 3:  "Here's my plan..."                ← Wastes a turn
Turn 4:  *actually starts working*

# Fix: Give clear instructions upfront so the agent works immediately
```

### 2. The Spawn Happy Pattern (Copilot killer)
```
sessions_spawn("Read the plan")              ← 1 request for reading?!
sessions_spawn("Pick the next task")         ← 1 request for picking?!
sessions_spawn("Implement the task")         ← Finally useful
sessions_spawn("Run the tests")             ← 1 request for one command?!

# Fix: One spawn that does all four steps
```

### 3. The "Let Me Check" Loop (Both killers)
```
"Check if the build passes"    → agent runs build, reports back
"OK now run the tests"         → agent runs tests, reports back
"OK now check the linter"      → agent runs linter, reports back

# Fix: "Run build, tests, and linter. Report all results."
```

### 4. Using Opus for Everything
```
# Opus is 3x on Copilot, token-heavy on Anthropic
# Most tasks don't need it

# Fix: Default to Sonnet. Upgrade to Opus only for:
# - Planning and decomposition
# - Complex algorithm design
# - Subtle bug diagnosis
# - Architecture decisions
```

### 5. Ignoring Context Size (Anthropic killer)
```
Main session at turn 50: "Hey can you also check the weather?"
# That weather check just cost 90K input tokens because of context

# Fix: Use a sub-agent for unrelated tasks
# Or start a new session for new topics
```

---

## Monitoring Your Usage

### GitHub Copilot
- Check premium request usage at: github.com/settings/copilot
- Track requests per task in your daily memory notes
- Set alerts at 80% of monthly budget

### Anthropic Claude Pro
- Check usage at: claude.ai/settings/usage (subscription)
- API usage at: console.anthropic.com (if using API directly)
- Monitor "Current session X% used" — stop at 90%
- Weekly reset: Sundays at 11 AM ET

### OpenClaw Session Status
```
/status  →  Shows current model, session %, premium request %
```

### Logging Strategy
Track in your daily memory notes:
```markdown
## Model Usage — 2026-03-18
- Copilot premium requests: 12 used today (45/300 monthly)
- Anthropic session: 35% used (resets in 3 days)
- Tasks completed: 8
- Cost per task: ~1.5 premium requests average
```

---

## Quick Reference Card

```
┌─────────────────────────────────────────────────┐
│           COST OPTIMIZATION CHEAT SHEET          │
├─────────────────────────────────────────────────┤
│                                                  │
│  COPILOT PRO (request-based):                   │
│    ✅ Batch tasks into one request              │
│    ✅ Let context grow (it's free)              │
│    ✅ Long sessions with many tool calls         │
│    ❌ Don't spawn many small sub-agents         │
│    ❌ Don't use Opus for simple tasks (3x!)     │
│                                                  │
│  ANTHROPIC PRO (token-based):                   │
│    ✅ Fresh sub-agents (clean context)          │
│    ✅ Short, focused interactions               │
│    ✅ Use Sonnet for most work                  │
│    ❌ Don't let main session context grow       │
│    ❌ Don't have long planning conversations    │
│                                                  │
│  GENERAL:                                        │
│    • Sonnet for building, Opus for thinking     │
│    • Batch related work, split unrelated work   │
│    • Monitor usage daily, adjust weekly         │
│    • When in doubt, use the cheaper model first │
│                                                  │
└─────────────────────────────────────────────────┘
```

---

_The cheapest token is the one you don't spend. The cheapest request is the one that does five things._