# Cost Optimization — Getting More Work Per Dollar > AI model subscriptions have fundamentally different billing models. > Understanding them is the difference between $5 and $50 for the same output. > This guide teaches you to match your work patterns to your billing model. --- ## The Two Billing Models ### Request-Based (GitHub Copilot Pro) - **What counts:** Each API call = 1 request - **Multipliers:** Advanced models cost more per request (e.g., Opus 4.6 = 3x) - **Key insight:** A 2-second request costs the same as a 10-minute request - **Budget:** Fixed monthly request pool (e.g., 300 premium requests/month) **What's expensive:** Many small requests **What's cheap:** Few large requests that do lots of work ### Token-Based (Anthropic Claude Pro) - **What counts:** Input tokens + output tokens consumed - **Windows:** Per-session (5-hour) and weekly token budgets - **Key insight:** Context grows with every turn — turn 50 includes ALL previous turns as input - **Danger zone:** Long conversations burn tokens exponentially **What's expensive:** Large context windows, long conversations **What's cheap:** Fresh sessions with minimal context --- ## How Context Affects Cost ### The Context Growth Problem (Token-Based) Every message in a conversation gets re-sent as context: ``` Turn 1: Input = system prompt (2K tokens) → Total: 2K Turn 5: Input = system prompt + 4 prior turns (8K) → Total: 8K Turn 10: Input = system prompt + 9 prior turns (20K) → Total: 20K Turn 20: Input = system prompt + 19 prior turns (50K) → Total: 50K Turn 30: Input = system prompt + 29 prior turns (90K) → Total: 90K ``` By turn 30, you're burning 90K input tokens PER TURN just for context. The actual new content might be 500 tokens, but you're paying for the full history every time. **This is why sub-agents are token-efficient on Anthropic:** Each sub-agent starts with ~2K tokens of context (just the task prompt), regardless of how long your main conversation has been running. ### Context Doesn't Matter (Request-Based) On Copilot, a request with 2K context costs the same as a request with 100K context. It's still 1 request. So for request-based billing: - Let context accumulate — it's free - Pack more work into each request - Don't spawn sub-agents unnecessarily (each spawn = new request) --- ## Optimal Strategies by Subscription ### GitHub Copilot Pro — Batch Everything **Goal:** Maximize work per request **Pattern: Fat Sub-Agents** ``` # BAD: 5 requests × 3 multiplier = 15 premium requests sessions_spawn("Do task 1") → 1 request sessions_spawn("Do task 2") → 1 request sessions_spawn("Do task 3") → 1 request sessions_spawn("Do task 4") → 1 request sessions_spawn("Do task 5") → 1 request # GOOD: 1 request × 3 multiplier = 3 premium requests sessions_spawn("Do tasks 1-5 sequentially. For each: implement, test, commit, then move to the next.") ``` **Same work, 80% cheaper.** **Pattern: Compound Tasks** ``` # BAD: 3 separate requests "Review the code" → 1 request "Fix the issues you found" → 1 request "Update the docs" → 1 request # GOOD: 1 compound request "Review the code, fix any issues you find, and update the docs to reflect the changes. Commit each fix separately." ``` **Pattern: Agent Harness with Multi-Task Iterations** ``` # Modified AGENT-INSTRUCTIONS.md for Copilot: ### 3. Pick Tasks - Find the NEXT 3-5 unchecked tasks in IMPLEMENTATION_PLAN.md - Complete ALL of them in this iteration - Commit after each task (for clean git history) - Then exit for a fresh context restart ``` **When to use Copilot models:** - Long autonomous coding sessions (lots of tool calls = still 1 request) - Complex multi-step tasks (planning + implementation + testing) - Agent harness iterations (pack 3-5 tasks per iteration) - Overnight batch work **When NOT to use Copilot models:** - Quick questions ("What time is it in Tokyo?") - Simple file reads or lookups - Anything you could do with a cheaper/free model ### Anthropic Claude Pro — Stay Lean **Goal:** Minimize token consumption per interaction **Pattern: Fresh Sub-Agents** ``` # GOOD on Anthropic: Each sub-agent starts with clean context sessions_spawn("Do task 1") → ~5K tokens (fresh context) sessions_spawn("Do task 2") → ~5K tokens (fresh context) sessions_spawn("Do task 3") → ~5K tokens (fresh context) Total: ~15K tokens # BAD on Anthropic: One long conversation Main session turn 1: "Do task 1" → 3K input tokens Main session turn 5: "Do task 2" → 15K input tokens Main session turn 10: "Do task 3" → 35K input tokens Total: ~53K tokens (3.5x more!) ``` **Pattern: Minimal Context Agent Instructions** ``` # BAD: Agent reads entire spec every iteration "Read PROJECT-SPEC.md (5000 tokens), IMPLEMENTATION_PLAN.md (2000 tokens), DECISIONS.md (1500 tokens), and the last 20 git commits..." # GOOD: Agent reads only what it needs "Read IMPLEMENTATION_PLAN.md. Find the first unchecked task. Read ONLY the relevant section of PROJECT-SPEC.md for that task. Implement, test, commit." ``` **Pattern: Offload to Cheaper Models** ``` # Use Sonnet (cheaper) for routine work sessions_spawn("Implement the CRUD endpoints", model: "sonnet") # Use Opus (expensive) only for complex reasoning sessions_spawn("Design the Monte Carlo simulation algorithm", model: "opus") ``` **When to use Anthropic models:** - Quick interactions in main session (small context = few tokens) - Tasks requiring strong reasoning (Opus quality) - Sub-agent swarms (fresh context each time) **When NOT to use Anthropic models:** - Long main-session conversations (context grows = token burn) - Low-complexity tasks (use a cheaper model) - Repetitive iterations (context grows even with similar content) --- ## Model Selection Guide ### By Task Complexity | Task | Recommended | Why | |------|-------------|-----| | Planning & decomposition | Opus (either provider) | Needs strong reasoning | | Scaffolding & config | Sonnet or GPT-4.1 | Simple, deterministic | | Feature implementation | Sonnet | Good balance | | Complex algorithms | Opus | Deep reasoning needed | | Bug diagnosis | Opus | Pattern recognition | | Bug fixing | Sonnet | Usually straightforward once diagnosed | | Documentation | Sonnet or GPT-4.1 | Writing, not complex reasoning | | Code review | Opus | Needs to spot subtle issues | | Test writing | Sonnet | Follows patterns from spec | ### By Provider Optimization | Scenario | Best Provider | Reasoning | |----------|--------------|-----------| | 5 tasks in agent harness | Copilot (batch 5 tasks = 1 request) | Request efficiency | | Quick "what's the status?" | Anthropic (small context) | Token efficiency | | Overnight autonomous loop | Copilot (fewer requests total) | Request efficiency | | Sub-agent swarm (10 agents) | Anthropic (fresh context each) | Token efficiency | | Long planning conversation | Copilot (context growth is free) | Request efficiency | | One-shot code generation | Either (1 request, small context) | Similar cost | --- ## The Hybrid Strategy Use both subscriptions strategically: ``` Morning check-in (main session): Anthropic Sonnet (small context, quick) Planning conversation: Copilot Opus (context growth is free) Agent harness iterations: Copilot Sonnet (batch tasks, 1 request each) Complex debugging: Copilot Opus (1 request, deep reasoning) Quick questions throughout the day: Anthropic Sonnet (minimal tokens) Overnight autonomous work: Copilot Sonnet (batch tasks, few requests) ``` ### Budget Allocation Example Monthly budget: - Copilot Pro: 300 premium requests (Opus = 3x, Sonnet = 1x) - Anthropic Pro: Weekly token budget (resets Sundays) **Agent harness project (20 iterations):** ``` Copilot approach: Planning: 1 Opus request = 3 premium → 3 20 iterations × batch 5 tasks = 4 Sonnet → 4 Code review: 2 Opus requests = 6 premium → 6 Total: 13 premium requests Anthropic approach: Planning: 1 session = ~10K tokens 20 iterations × 1 task = 20 sessions × ~8K → 160K tokens Code review: 2 sessions × ~15K → 30K tokens Total: ~200K tokens (could eat a chunk of weekly budget) ``` For an agent harness project, **Copilot is usually cheaper** because you can batch. For daily conversational use, **Anthropic is usually cheaper** because most interactions are short. --- ## Anti-Patterns to Avoid ### 1. The Chatty Agent (Anthropic killer) ``` Turn 1: "What should I work on?" ← Wastes a turn Turn 2: "I'll start with the parser" ← Wastes a turn Turn 3: "Here's my plan..." ← Wastes a turn Turn 4: *actually starts working* # Fix: Give clear instructions upfront so the agent works immediately ``` ### 2. The Spawn Happy Pattern (Copilot killer) ``` sessions_spawn("Read the plan") ← 1 request for reading?! sessions_spawn("Pick the next task") ← 1 request for picking?! sessions_spawn("Implement the task") ← Finally useful sessions_spawn("Run the tests") ← 1 request for one command?! # Fix: One spawn that does all four steps ``` ### 3. The "Let Me Check" Loop (Both killers) ``` "Check if the build passes" → agent runs build, reports back "OK now run the tests" → agent runs tests, reports back "OK now check the linter" → agent runs linter, reports back # Fix: "Run build, tests, and linter. Report all results." ``` ### 4. Using Opus for Everything ``` # Opus is 3x on Copilot, token-heavy on Anthropic # Most tasks don't need it # Fix: Default to Sonnet. Upgrade to Opus only for: # - Planning and decomposition # - Complex algorithm design # - Subtle bug diagnosis # - Architecture decisions ``` ### 5. Ignoring Context Size (Anthropic killer) ``` Main session at turn 50: "Hey can you also check the weather?" # That weather check just cost 90K input tokens because of context # Fix: Use a sub-agent for unrelated tasks # Or start a new session for new topics ``` --- ## Monitoring Your Usage ### GitHub Copilot - Check premium request usage at: github.com/settings/copilot - Track requests per task in your daily memory notes - Set alerts at 80% of monthly budget ### Anthropic Claude Pro - Check usage at: claude.ai/settings/usage (subscription) - API usage at: console.anthropic.com (if using API directly) - Monitor "Current session X% used" — stop at 90% - Weekly reset: Sundays at 11 AM ET ### OpenClaw Session Status ``` /status → Shows current model, session %, premium request % ``` ### Logging Strategy Track in your daily memory notes: ```markdown ## Model Usage — 2026-03-18 - Copilot premium requests: 12 used today (45/300 monthly) - Anthropic session: 35% used (resets in 3 days) - Tasks completed: 8 - Cost per task: ~1.5 premium requests average ``` --- ## Quick Reference Card ``` ┌─────────────────────────────────────────────────┐ │ COST OPTIMIZATION CHEAT SHEET │ ├─────────────────────────────────────────────────┤ │ │ │ COPILOT PRO (request-based): │ │ ✅ Batch tasks into one request │ │ ✅ Let context grow (it's free) │ │ ✅ Long sessions with many tool calls │ │ ❌ Don't spawn many small sub-agents │ │ ❌ Don't use Opus for simple tasks (3x!) │ │ │ │ ANTHROPIC PRO (token-based): │ │ ✅ Fresh sub-agents (clean context) │ │ ✅ Short, focused interactions │ │ ✅ Use Sonnet for most work │ │ ❌ Don't let main session context grow │ │ ❌ Don't have long planning conversations │ │ │ │ GENERAL: │ │ • Sonnet for building, Opus for thinking │ │ • Batch related work, split unrelated work │ │ • Monitor usage daily, adjust weekly │ │ • When in doubt, use the cheaper model first │ │ │ └─────────────────────────────────────────────────┘ ``` --- _The cheapest token is the one you don't spend. The cheapest request is the one that does five things._