🇺🇸 United States · Anthropic Claude — The Model Family
Status: 🟩 COMPLETE (🟦 LIVING — Anthropic ships new models every few months; treat this as a snapshot) Last updated: 2026-06-28 Plain-English tagline: The current Claude lineup, what each one is good at, how much each costs, and how to choose between them.
| Vendor | Anthropic |
| Country/origin | 🇺🇸 United States (San Francisco) |
| Recommended for AUS? | ✅ Yes — Australian-friendly; strong privacy commitments; API does not train on your data |
| Privacy summary | API content not used for model training; SOC 2 Type II; HIPAA available; standard enterprise DPA addresses APP 8 cross-border disclosure |
| First released | Claude 1 (Mar 2023); Claude 2 (Jul 2023); Claude 3 (Mar 2024); Claude 3.5 (Jun 2024); Claude 4 (May 2025); Claude 4.5/4.6/4.7/4.8 throughout 2025–2026; Fable 5 (2026) |
| Last reviewed | June 2026 |
| Official site | https://anthropic.com |
In plain English
“Claude” is not one model — it’s a family of models, each with different sizes, capabilities, speeds, and prices. As of June 2026, the lineup is:
| Model | Tier | Best for |
|---|---|---|
| Fable 5 | Top frontier | The very hardest reasoning, longest context, frontier benchmarks |
| Opus 4.8 | High-capability | Hard reasoning, long agent runs, demanding code work |
| Sonnet 4.6 | Workhorse | Best price-to-performance — most production traffic |
| Haiku 4.5 | Fast & cheap | Simple tasks at high volume; quick interactive feedback |
Each one gets the same training methodology (Constitutional AI, RLHF) but with different sizes, training budgets, and intended use cases. They all speak the same API — switching between them is a one-line change to your code.
In a Claude Code session, you can change the active model anytime with /model.
Why it matters
Choosing the right model:
- Saves money. Sonnet vs Opus at scale is the difference between 3000/month for the same workload. Haiku is cheaper still.
- Improves latency. Smaller models respond faster. For interactive UIs, sub-second response matters.
- Improves quality for hard tasks. A model too small for a task produces frustrating, unreliable output. A model too big is fine but wasteful.
The skill is matching the model to the task. After a few weeks of using them, you develop intuitions: “this is an Opus job” vs “Haiku is plenty for this.”
The current lineup (June 2026 snapshot)
Claude Fable 5 — the new frontier
- Released: June 2026 (Anthropic released Fable 5 and Mythos 5 on June 9, 2026; customer access was paused on June 12 — currently limited rollout)
- Designed for: the absolute hardest reasoning, longest-horizon agent runs, frontier-benchmark workloads
- Context window: 1M tokens, with up to 128K output
- Pricing: ~50 per million input/output tokens
- When to use: complex multi-step coding, deep research, novel reasoning tasks. Not for routine work.
Claude Opus 4.8 — the high-capability daily driver
- Released: May 28, 2026
- Designed for: demanding reasoning, agent work, code generation, structured analysis
- Context window: ~200K tokens
- Pricing: ~25 per million input/output tokens
- When to use: Claude Code sessions where you want the best reasoning; production agents doing important work; tasks where one good answer is worth more than five medium ones.
Claude Sonnet 4.6 — the workhorse
- Released: mid-2026
- Designed for: the best price-to-performance ratio; “good enough for most things”
- Context window: 1M tokens
- Pricing: ~15 per million input/output tokens
- When to use: most production traffic. Customer-facing chat, RAG pipelines, code review, summarization. The default unless you have a specific reason to go bigger or smaller.
Claude Haiku 4.5 — fast and cheap
- Released: October 2025
- Designed for: high-volume, low-complexity tasks
- Context window: ~200K tokens
- Pricing: ~5 per million input/output tokens
- When to use: classification, simple extraction, first-pass triage, sub-tasks within an agent where speed matters. Often paired with a bigger model — Haiku for “is this question about X?”, then Opus for the actual answer.
How to choose between them (the heuristic)
| Question | Answer |
|---|---|
| Will I use it interactively in a UI where latency matters? | Lean Haiku or Sonnet |
| Is it doing a hard one-shot task where quality is paramount? | Opus (or Fable if exceptional) |
| Is it running an agent that needs to plan over many steps? | Opus for planning; Sonnet/Haiku for routine sub-steps |
| Is it processing high volume (thousands/millions of calls)? | Haiku if quality holds; Sonnet if not |
| Is it doing creative writing where style matters? | Sonnet or Opus; test which voice you prefer |
| Is it summarizing or extracting from long documents? | Sonnet (1M context, cheap) is usually the sweet spot |
| Is it the brains of a critical production system? | Pick once based on testing; lock the version |
A pattern that’s increasingly common: model routing. Send a simple input to Haiku; if Haiku’s confidence is low, escalate to Sonnet or Opus. Saves money without sacrificing the hard cases.
What “version 4.8” means
Anthropic uses model versions like claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5, claude-fable-5. The convention:
- Family (Opus / Sonnet / Haiku / Fable) — the size/tier
- Major version (3, 4, 5…) — significant capability jumps; backwards-incompatible changes possible
- Minor version (.5, .6, .7, .8) — incremental improvements; usually backwards-compatible
Within a major version, newer minor versions are almost always better for the same price. Newer major versions cost more but are also more capable.
When you call the API, you specify the exact version: model: "claude-opus-4-8". This is important for production — using the un-versioned alias "claude-opus" means Anthropic might upgrade you to a newer model without warning, possibly changing behavior. Always pin to a specific version in production.
Cost optimization techniques
The single biggest production lever:
Prompt caching
Mark static parts of your prompt as cacheable. Subsequent requests reuse the cached prefix at 10% of normal cost.
For agents and RAG pipelines, this routinely saves 70–95% of input costs.
Batch processing
For non-real-time workloads, Anthropic’s batch API processes requests asynchronously at 50% off. Useful for nightly summaries, embeddings generation, large dataset processing.
Model routing
As above — Haiku for easy stuff, Opus for hard. Done right, costs go down without quality going down.
Right-sizing max_tokens
Don’t set max_tokens: 4096 for tasks that produce 50-token responses. The reservation doesn’t cost extra, but bigger budgets sometimes encourage the model to produce more output than needed.
Output compression
For programmatic outputs (JSON, structured data), the schema cost is paid once via tool use; subsequent outputs are minimal.
Caching results yourself
If many users are likely to ask similar things, cache the model’s response. The cheapest LLM call is the one you didn’t make.
Capabilities across the lineup
All current Claude models share most capabilities; size mostly affects depth and reliability:
| Capability | Fable 5 | Opus 4.8 | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|---|---|
| Tool use | ✅ | ✅ | ✅ | ✅ |
| Vision (image input) | ✅ | ✅ | ✅ | ✅ |
| Extended thinking | ✅ | ✅ | ✅ | Limited |
| Long-context retrieval | ✅✅ | ✅ | ✅✅ | ✅ |
| Code generation | ✅✅✅ | ✅✅✅ | ✅✅ | ✅ |
| Complex multi-step reasoning | ✅✅✅ | ✅✅✅ | ✅✅ | ✅ |
| Speed | Slowest | Slow-medium | Medium-fast | Fastest |
| Cost per token | Highest | High | Medium | Lowest |
For the day-to-day question of “is X model good enough for Y task,” the answer is usually yes for Sonnet and almost-always-yes for Opus.
Other models you’ll hear about
- GPT-4 family (OpenAI) — the main alternative. Different strengths (broader ecosystem integration, image generation built in). Comparable quality at most tasks.
- Gemini (Google) — strong on long context (2M tokens), multimodal, deeply integrated with Google ecosystem.
- Llama (Meta) — open-weight; competitive at smaller sizes; self-hostable.
- Mistral, Qwen, DeepSeek — strong open-weight alternatives.
For most application work, the choice between major frontier providers (Anthropic, OpenAI, Google) is more about ecosystem fit than raw capability. Test the models against your actual use cases.
Common gotchas
-
Pin model versions in production. Don’t use generic aliases.
-
The same task can need different models at different stages. Use Opus for planning, Sonnet for execution, Haiku for trivial sub-tasks.
-
“Extended thinking” mode trades cost for quality. When you turn on extended thinking, the model uses extra tokens internally to reason before answering. Better outputs, higher cost. Use selectively.
-
Vision tokens are real. A high-resolution image can cost more than a paragraph of text. Resize images before sending if the high resolution isn’t needed.
-
Streaming doesn’t reduce cost (covered in tokens entry but worth restating) — it just improves perceived latency.
-
Latency varies by load. During peak hours, Anthropic’s API can be slower. Build retries and timeouts.
-
Rate limits scale with usage tier. New API keys have lower limits. Plan for graduation as you scale.
-
Newer != always better for your task. Newer minor versions almost always are; major versions occasionally shift in style that may not suit you. Test before migrating production.
-
“What’s the latest model?” is the wrong question. “What’s the best fit for my task and budget?” is the right one.
See also
- What is an LLM? 🟩
- Tokens & context windows 🟩 — pricing scales with these
- The Claude API 🟩 🟦
- Temperature & sampling 🟩
- Prompt engineering 🟩
- Tool use 🟩
- Agents 🟩
- Multimodal (vision, audio) 🟩 🟦
- Fine-tuning vs context 🟩
- How LLMs work 🟩
- Claude Code deep dive 🟩 🟦
- Glossary: Claude, LLM