357 lines
12 KiB
Markdown
357 lines
12 KiB
Markdown
# Cost Optimization — Getting More Work Per Dollar
|
||
|
||
> AI model subscriptions have fundamentally different billing models.
|
||
> Understanding them is the difference between $5 and $50 for the same output.
|
||
> This guide teaches you to match your work patterns to your billing model.
|
||
|
||
---
|
||
|
||
## The Two Billing Models
|
||
|
||
### Request-Based (GitHub Copilot Pro)
|
||
|
||
- **What counts:** Each API call = 1 request
|
||
- **Multipliers:** Advanced models cost more per request (e.g., Opus 4.6 = 3x)
|
||
- **Key insight:** A 2-second request costs the same as a 10-minute request
|
||
- **Budget:** Fixed monthly request pool (e.g., 300 premium requests/month)
|
||
|
||
**What's expensive:** Many small requests
|
||
**What's cheap:** Few large requests that do lots of work
|
||
|
||
### Token-Based (Anthropic Claude Pro)
|
||
|
||
- **What counts:** Input tokens + output tokens consumed
|
||
- **Windows:** Per-session (5-hour) and weekly token budgets
|
||
- **Key insight:** Context grows with every turn — turn 50 includes ALL previous turns as input
|
||
- **Danger zone:** Long conversations burn tokens exponentially
|
||
|
||
**What's expensive:** Large context windows, long conversations
|
||
**What's cheap:** Fresh sessions with minimal context
|
||
|
||
---
|
||
|
||
## How Context Affects Cost
|
||
|
||
### The Context Growth Problem (Token-Based)
|
||
|
||
Every message in a conversation gets re-sent as context:
|
||
|
||
```
|
||
Turn 1: Input = system prompt (2K tokens) → Total: 2K
|
||
Turn 5: Input = system prompt + 4 prior turns (8K) → Total: 8K
|
||
Turn 10: Input = system prompt + 9 prior turns (20K) → Total: 20K
|
||
Turn 20: Input = system prompt + 19 prior turns (50K) → Total: 50K
|
||
Turn 30: Input = system prompt + 29 prior turns (90K) → Total: 90K
|
||
```
|
||
|
||
By turn 30, you're burning 90K input tokens PER TURN just for context. The actual
|
||
new content might be 500 tokens, but you're paying for the full history every time.
|
||
|
||
**This is why sub-agents are token-efficient on Anthropic:**
|
||
Each sub-agent starts with ~2K tokens of context (just the task prompt), regardless
|
||
of how long your main conversation has been running.
|
||
|
||
### Context Doesn't Matter (Request-Based)
|
||
|
||
On Copilot, a request with 2K context costs the same as a request with 100K context.
|
||
It's still 1 request. So for request-based billing:
|
||
- Let context accumulate — it's free
|
||
- Pack more work into each request
|
||
- Don't spawn sub-agents unnecessarily (each spawn = new request)
|
||
|
||
---
|
||
|
||
## Optimal Strategies by Subscription
|
||
|
||
### GitHub Copilot Pro — Batch Everything
|
||
|
||
**Goal:** Maximize work per request
|
||
|
||
**Pattern: Fat Sub-Agents**
|
||
```
|
||
# BAD: 5 requests × 3 multiplier = 15 premium requests
|
||
sessions_spawn("Do task 1") → 1 request
|
||
sessions_spawn("Do task 2") → 1 request
|
||
sessions_spawn("Do task 3") → 1 request
|
||
sessions_spawn("Do task 4") → 1 request
|
||
sessions_spawn("Do task 5") → 1 request
|
||
|
||
# GOOD: 1 request × 3 multiplier = 3 premium requests
|
||
sessions_spawn("Do tasks 1-5 sequentially. For each:
|
||
implement, test, commit, then move to the next.")
|
||
```
|
||
|
||
**Same work, 80% cheaper.**
|
||
|
||
**Pattern: Compound Tasks**
|
||
```
|
||
# BAD: 3 separate requests
|
||
"Review the code" → 1 request
|
||
"Fix the issues you found" → 1 request
|
||
"Update the docs" → 1 request
|
||
|
||
# GOOD: 1 compound request
|
||
"Review the code, fix any issues you find, and update the
|
||
docs to reflect the changes. Commit each fix separately."
|
||
```
|
||
|
||
**Pattern: Agent Harness with Multi-Task Iterations**
|
||
```
|
||
# Modified AGENT-INSTRUCTIONS.md for Copilot:
|
||
### 3. Pick Tasks
|
||
- Find the NEXT 3-5 unchecked tasks in IMPLEMENTATION_PLAN.md
|
||
- Complete ALL of them in this iteration
|
||
- Commit after each task (for clean git history)
|
||
- Then exit for a fresh context restart
|
||
```
|
||
|
||
**When to use Copilot models:**
|
||
- Long autonomous coding sessions (lots of tool calls = still 1 request)
|
||
- Complex multi-step tasks (planning + implementation + testing)
|
||
- Agent harness iterations (pack 3-5 tasks per iteration)
|
||
- Overnight batch work
|
||
|
||
**When NOT to use Copilot models:**
|
||
- Quick questions ("What time is it in Tokyo?")
|
||
- Simple file reads or lookups
|
||
- Anything you could do with a cheaper/free model
|
||
|
||
### Anthropic Claude Pro — Stay Lean
|
||
|
||
**Goal:** Minimize token consumption per interaction
|
||
|
||
**Pattern: Fresh Sub-Agents**
|
||
```
|
||
# GOOD on Anthropic: Each sub-agent starts with clean context
|
||
sessions_spawn("Do task 1") → ~5K tokens (fresh context)
|
||
sessions_spawn("Do task 2") → ~5K tokens (fresh context)
|
||
sessions_spawn("Do task 3") → ~5K tokens (fresh context)
|
||
Total: ~15K tokens
|
||
|
||
# BAD on Anthropic: One long conversation
|
||
Main session turn 1: "Do task 1" → 3K input tokens
|
||
Main session turn 5: "Do task 2" → 15K input tokens
|
||
Main session turn 10: "Do task 3" → 35K input tokens
|
||
Total: ~53K tokens (3.5x more!)
|
||
```
|
||
|
||
**Pattern: Minimal Context Agent Instructions**
|
||
```
|
||
# BAD: Agent reads entire spec every iteration
|
||
"Read PROJECT-SPEC.md (5000 tokens), IMPLEMENTATION_PLAN.md (2000 tokens),
|
||
DECISIONS.md (1500 tokens), and the last 20 git commits..."
|
||
|
||
# GOOD: Agent reads only what it needs
|
||
"Read IMPLEMENTATION_PLAN.md. Find the first unchecked task.
|
||
Read ONLY the relevant section of PROJECT-SPEC.md for that task.
|
||
Implement, test, commit."
|
||
```
|
||
|
||
**Pattern: Offload to Cheaper Models**
|
||
```
|
||
# Use Sonnet (cheaper) for routine work
|
||
sessions_spawn("Implement the CRUD endpoints", model: "sonnet")
|
||
|
||
# Use Opus (expensive) only for complex reasoning
|
||
sessions_spawn("Design the Monte Carlo simulation algorithm", model: "opus")
|
||
```
|
||
|
||
**When to use Anthropic models:**
|
||
- Quick interactions in main session (small context = few tokens)
|
||
- Tasks requiring strong reasoning (Opus quality)
|
||
- Sub-agent swarms (fresh context each time)
|
||
|
||
**When NOT to use Anthropic models:**
|
||
- Long main-session conversations (context grows = token burn)
|
||
- Low-complexity tasks (use a cheaper model)
|
||
- Repetitive iterations (context grows even with similar content)
|
||
|
||
---
|
||
|
||
## Model Selection Guide
|
||
|
||
### By Task Complexity
|
||
|
||
| Task | Recommended | Why |
|
||
|------|-------------|-----|
|
||
| Planning & decomposition | Opus (either provider) | Needs strong reasoning |
|
||
| Scaffolding & config | Sonnet or GPT-4.1 | Simple, deterministic |
|
||
| Feature implementation | Sonnet | Good balance |
|
||
| Complex algorithms | Opus | Deep reasoning needed |
|
||
| Bug diagnosis | Opus | Pattern recognition |
|
||
| Bug fixing | Sonnet | Usually straightforward once diagnosed |
|
||
| Documentation | Sonnet or GPT-4.1 | Writing, not complex reasoning |
|
||
| Code review | Opus | Needs to spot subtle issues |
|
||
| Test writing | Sonnet | Follows patterns from spec |
|
||
|
||
### By Provider Optimization
|
||
|
||
| Scenario | Best Provider | Reasoning |
|
||
|----------|--------------|-----------|
|
||
| 5 tasks in agent harness | Copilot (batch 5 tasks = 1 request) | Request efficiency |
|
||
| Quick "what's the status?" | Anthropic (small context) | Token efficiency |
|
||
| Overnight autonomous loop | Copilot (fewer requests total) | Request efficiency |
|
||
| Sub-agent swarm (10 agents) | Anthropic (fresh context each) | Token efficiency |
|
||
| Long planning conversation | Copilot (context growth is free) | Request efficiency |
|
||
| One-shot code generation | Either (1 request, small context) | Similar cost |
|
||
|
||
---
|
||
|
||
## The Hybrid Strategy
|
||
|
||
Use both subscriptions strategically:
|
||
|
||
```
|
||
Morning check-in (main session): Anthropic Sonnet (small context, quick)
|
||
Planning conversation: Copilot Opus (context growth is free)
|
||
Agent harness iterations: Copilot Sonnet (batch tasks, 1 request each)
|
||
Complex debugging: Copilot Opus (1 request, deep reasoning)
|
||
Quick questions throughout the day: Anthropic Sonnet (minimal tokens)
|
||
Overnight autonomous work: Copilot Sonnet (batch tasks, few requests)
|
||
```
|
||
|
||
### Budget Allocation Example
|
||
|
||
Monthly budget:
|
||
- Copilot Pro: 300 premium requests (Opus = 3x, Sonnet = 1x)
|
||
- Anthropic Pro: Weekly token budget (resets Sundays)
|
||
|
||
**Agent harness project (20 iterations):**
|
||
```
|
||
Copilot approach:
|
||
Planning: 1 Opus request = 3 premium → 3
|
||
20 iterations × batch 5 tasks = 4 Sonnet → 4
|
||
Code review: 2 Opus requests = 6 premium → 6
|
||
Total: 13 premium requests
|
||
|
||
Anthropic approach:
|
||
Planning: 1 session = ~10K tokens
|
||
20 iterations × 1 task = 20 sessions × ~8K → 160K tokens
|
||
Code review: 2 sessions × ~15K → 30K tokens
|
||
Total: ~200K tokens (could eat a chunk of weekly budget)
|
||
```
|
||
|
||
For an agent harness project, **Copilot is usually cheaper** because you can batch.
|
||
|
||
For daily conversational use, **Anthropic is usually cheaper** because most interactions are short.
|
||
|
||
---
|
||
|
||
## Anti-Patterns to Avoid
|
||
|
||
### 1. The Chatty Agent (Anthropic killer)
|
||
```
|
||
Turn 1: "What should I work on?" ← Wastes a turn
|
||
Turn 2: "I'll start with the parser" ← Wastes a turn
|
||
Turn 3: "Here's my plan..." ← Wastes a turn
|
||
Turn 4: *actually starts working*
|
||
|
||
# Fix: Give clear instructions upfront so the agent works immediately
|
||
```
|
||
|
||
### 2. The Spawn Happy Pattern (Copilot killer)
|
||
```
|
||
sessions_spawn("Read the plan") ← 1 request for reading?!
|
||
sessions_spawn("Pick the next task") ← 1 request for picking?!
|
||
sessions_spawn("Implement the task") ← Finally useful
|
||
sessions_spawn("Run the tests") ← 1 request for one command?!
|
||
|
||
# Fix: One spawn that does all four steps
|
||
```
|
||
|
||
### 3. The "Let Me Check" Loop (Both killers)
|
||
```
|
||
"Check if the build passes" → agent runs build, reports back
|
||
"OK now run the tests" → agent runs tests, reports back
|
||
"OK now check the linter" → agent runs linter, reports back
|
||
|
||
# Fix: "Run build, tests, and linter. Report all results."
|
||
```
|
||
|
||
### 4. Using Opus for Everything
|
||
```
|
||
# Opus is 3x on Copilot, token-heavy on Anthropic
|
||
# Most tasks don't need it
|
||
|
||
# Fix: Default to Sonnet. Upgrade to Opus only for:
|
||
# - Planning and decomposition
|
||
# - Complex algorithm design
|
||
# - Subtle bug diagnosis
|
||
# - Architecture decisions
|
||
```
|
||
|
||
### 5. Ignoring Context Size (Anthropic killer)
|
||
```
|
||
Main session at turn 50: "Hey can you also check the weather?"
|
||
# That weather check just cost 90K input tokens because of context
|
||
|
||
# Fix: Use a sub-agent for unrelated tasks
|
||
# Or start a new session for new topics
|
||
```
|
||
|
||
---
|
||
|
||
## Monitoring Your Usage
|
||
|
||
### GitHub Copilot
|
||
- Check premium request usage at: github.com/settings/copilot
|
||
- Track requests per task in your daily memory notes
|
||
- Set alerts at 80% of monthly budget
|
||
|
||
### Anthropic Claude Pro
|
||
- Check usage at: claude.ai/settings/usage (subscription)
|
||
- API usage at: console.anthropic.com (if using API directly)
|
||
- Monitor "Current session X% used" — stop at 90%
|
||
- Weekly reset: Sundays at 11 AM ET
|
||
|
||
### OpenClaw Session Status
|
||
```
|
||
/status → Shows current model, session %, premium request %
|
||
```
|
||
|
||
### Logging Strategy
|
||
Track in your daily memory notes:
|
||
```markdown
|
||
## Model Usage — 2026-03-18
|
||
- Copilot premium requests: 12 used today (45/300 monthly)
|
||
- Anthropic session: 35% used (resets in 3 days)
|
||
- Tasks completed: 8
|
||
- Cost per task: ~1.5 premium requests average
|
||
```
|
||
|
||
---
|
||
|
||
## Quick Reference Card
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────┐
|
||
│ COST OPTIMIZATION CHEAT SHEET │
|
||
├─────────────────────────────────────────────────┤
|
||
│ │
|
||
│ COPILOT PRO (request-based): │
|
||
│ ✅ Batch tasks into one request │
|
||
│ ✅ Let context grow (it's free) │
|
||
│ ✅ Long sessions with many tool calls │
|
||
│ ❌ Don't spawn many small sub-agents │
|
||
│ ❌ Don't use Opus for simple tasks (3x!) │
|
||
│ │
|
||
│ ANTHROPIC PRO (token-based): │
|
||
│ ✅ Fresh sub-agents (clean context) │
|
||
│ ✅ Short, focused interactions │
|
||
│ ✅ Use Sonnet for most work │
|
||
│ ❌ Don't let main session context grow │
|
||
│ ❌ Don't have long planning conversations │
|
||
│ │
|
||
│ GENERAL: │
|
||
│ • Sonnet for building, Opus for thinking │
|
||
│ • Batch related work, split unrelated work │
|
||
│ • Monitor usage daily, adjust weekly │
|
||
│ • When in doubt, use the cheaper model first │
|
||
│ │
|
||
└─────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
_The cheapest token is the one you don't spend. The cheapest request is the one that does five things._
|