Lesson 11 / 14

11. Models and pricing

Model selection is a speed/cost/quality trade-off, not "always Opus". Cache reads cost 0.1× input (10× cheaper), output ~5× input. Opus 4.7 has a new tokenizer (+35% tokens on the same texts), on Bedrock/Vertex default aliases are shifted back one version.

Model selection is not “always Opus because it’s better”. It’s a trade-off between speed / cost / quality for a specific task. This chapter covers tables, budgets, and combination strategies.


11.1. Available models as of 23.04.2026

ModelAPI AliasContextStrengths
Claude Opus 4.7opus, claude-opus-4-7200k (1M via opus[1m])Best agentic coding, complex reasoning
Claude Opus 4.6claude-opus-4-6200k (1M via opus[1m])Legacy, same pricing, old tokenizer
Claude Sonnet 4.6sonnet, claude-sonnet-4-6200k (1M via sonnet[1m])Best speed-to-quality ratio
Claude Haiku 4.5haiku, claude-haiku-4-5-20251001200kFast and cheap, near-frontier intelligence

⚠️ On Bedrock / Vertex / Foundry, default aliases are shifted back one version. opus there → 4.6, sonnet → 4.5. If you need the latest — specify the full model name.

⚠️ Opus 4.7 has a new tokenizer — on the same texts it consumes up to +35% tokens compared to Opus 4.6. If you had estimates for 4.6 — recalculate.


11.2. Pricing (April 2026)

ModelInput ($/MTok)Output ($/MTok)Cache write 5min ($/MTok)Cache write 1h ($/MTok)Cache read ($/MTok)
Opus 4.7 / 4.6$5$25$6.25$10$0.50
Sonnet 4.6$3$15$3.75$6$0.30
Haiku 4.5$1$5$1.25$2$0.10

Key multipliers (same for all models):

  • Cache write 5min = 1.25× input.
  • Cache write 1h = input.
  • Cache read = 0.1× input (10 times cheaper!).
  • Output usually input.

11.3. Psychological model “when to use what”

Use metaphors from the original Twitter thread — they work:

📘 From docs: Opus 4.7 — “most capable for complex reasoning and agentic coding”, Sonnet 4.6 — “best combination of speed and intelligence”, Haiku 4.5 — “fastest model with near-frontier intelligence”.


11.4. Model combination strategies

11.4.1. Default: Sonnet

Start most sessions with Sonnet 4.6. It’s a sensible baseline.

11.4.2. opusplan for architecture

/model opusplan

Enable plan mode on Opus. After ExitPlanMode it automatically switches to Sonnet for implementation. This is the correct “think with Opus, do with Sonnet” pattern.

⚠️ In opusplan, the plan phase runs in standard 200k, even if you enabled a 1M window.

11.4.3. Subagents with different models

Different subagents can run on different models:

# trip-architect.md → opus (сложная декомпозиция маршрутов)
model: opus

# code-reviewer.md → sonnet
model: sonnet

# explore (built-in) → haiku

This lets you keep the main agent on Sonnet and upgrade the model only when needed for specialized subagents.

11.4.4. Agent Teams: Lead = Opus, teammates = Sonnet/Haiku

Lead handles planning and coordination — needs Opus. Teammates execute simple tasks — Sonnet or Haiku.


11.5. Calculating a typical Travel Agent session

One session of 30 turns on Sonnet:

Префикс (system + CLAUDE.md + skills + tools) ~ 25k токенов
Output на turn ~ 1k токенов
Tool results на turn ~ 2k токенов

Without cache:

30 × (25k input × $3/M + 3k input × $3/M + 1k output × $15/M)
= 30 × ($0.075 + $0.009 + $0.015)
= 30 × $0.099
= $2.97

With cache (TTL 5min, no pauses):

1 × cache write (25k × $3.75/M = $0.094)
+ 29 × cache read (25k × $0.30/M = $0.0075)
+ 30 × non-cached input (3k × $3/M = $0.009)
+ 30 × output (1k × $15/M = $0.015)
= $0.094 + $0.218 + $0.27 + $0.45
= $1.03

Savings — 65%. And that’s on a modest session. On longer ones with large CLAUDE.md the effect is even stronger.


11.6. 1M context: when it’s justified

Case1M justified?
Load entire monorepo as context once✅ If many small tasks follow. Cache + 1M = okay
Long multi-hour session with accumulated history⚠️ Better to use /compact, otherwise quality drops
Parse huge log in one request✅ One request better than ten with pagination
”Just in case”❌ Pay more, get worse results

📘 Enabled via alias opus[1m] or sonnet[1m]. On Max/Team/Enterprise.

⚠️ Remember that opusplan does NOT support 1M window.

⚠️ Empirically: many practitioners report that after 300-400k in the window, quality drops. This isn’t from Anthropic docs, but the symptoms are familiar (model forgets early decisions, contradicts itself, re-reads files).


11.7. Budgets and monitoring

📘 Commands:

CommandWhat it shows
/costCurrent and accumulated costs for this session
/usageCosts for a period, by model
/release-notesVersion news (sometimes pricing updates)

🔧 Environment variables for alerts:

export CLAUDE_CODE_BUDGET_USD_SESSION=5      # warn at $5/session
export CLAUDE_CODE_BUDGET_USD_DAILY=50       # daily ceiling

11.8. Should you revisit CLAUDE.md and skills with new models?

⚠️ The claim “settings become outdated over time, you need to revisit CLAUDE.md and skills with new models” — is sound practice, but not a quote from docs. There’s no direct recommendation in public docs.

Reality:

  • CLAUDE.md itself usually stays relevant (project stack changes less often than models).
  • Skills can become outdated if you stuffed them with “model understands X poorly, always remind it” — but the new model understands X on its own.
  • Hooks usually don’t depend on the model.

💡 Once a quarter, quickly review CLAUDE.md and /skills, ask yourself: “is this still needed for current models?”. Especially hints like “don’t forget to return Promise<T>” — Sonnet 4.6 already doesn’t forget.


11.9. Context windows for subagents

📝 Each subagent has its own limit:

  • On Haiku-subagent the window is 200k.
  • On Sonnet/Opus-subagent — 200k or 1M (if enabled).

This gives a convenient pattern: keep main context on 200k Sonnet, and a browse-heavy subagent on 1M Sonnet. The subagent reads most of the repo, returns a summary, main context doesn’t suffer.


11.10. Antipatterns

Always Opus. Expensive and unnecessary. Sonnet handles 80% of tasks.

Always Haiku. Fast and cheap, but on a complex task it will loop and end up costing more than Sonnet.

Switch models mid-task without opusplan. Cache miss + loss of context trust. Use opusplan if you need switching.

Enable 1M by default. Expensive, slower, and quality isn’t better.

Don’t use prompt cache. Check that your SDK code adds cache_control markers. In Claude Code this is already built-in.


Next → 12. Travel Agent from scratch: blueprint