🇺🇸 United States · Anthropic Claude — The Model Family

Status: 🟩 COMPLETE (🟦 LIVING — Anthropic ships new models every few months; treat this as a snapshot) Last updated: 2026-06-28 Plain-English tagline: The current Claude lineup, what each one is good at, how much each costs, and how to choose between them.


Vendor	Anthropic
Country/origin	🇺🇸 United States (San Francisco)
Recommended for AUS?	✅ Yes — Australian-friendly; strong privacy commitments; API does not train on your data
Privacy summary	API content not used for model training; SOC 2 Type II; HIPAA available; standard enterprise DPA addresses APP 8 cross-border disclosure
First released	Claude 1 (Mar 2023); Claude 2 (Jul 2023); Claude 3 (Mar 2024); Claude 3.5 (Jun 2024); Claude 4 (May 2025); Claude 4.5/4.6/4.7/4.8 throughout 2025–2026; Fable 5 (2026)
Last reviewed	June 2026
Official site	https://anthropic.com

In plain English

“Claude” is not one model — it’s a family of models, each with different sizes, capabilities, speeds, and prices. As of June 2026, the lineup is:

Model	Tier	Best for
Fable 5	Top frontier	The very hardest reasoning, longest context, frontier benchmarks
Opus 4.8	High-capability	Hard reasoning, long agent runs, demanding code work
Sonnet 4.6	Workhorse	Best price-to-performance — most production traffic
Haiku 4.5	Fast & cheap	Simple tasks at high volume; quick interactive feedback

Each one gets the same training methodology (Constitutional AI, RLHF) but with different sizes, training budgets, and intended use cases. They all speak the same API — switching between them is a one-line change to your code.

In a Claude Code session, you can change the active model anytime with /model.

Why it matters

Choosing the right model:

Saves money. Sonnet vs Opus at scale is the difference between $300/ m o n t han d$ 3000/month for the same workload. Haiku is cheaper still.
Improves latency. Smaller models respond faster. For interactive UIs, sub-second response matters.
Improves quality for hard tasks. A model too small for a task produces frustrating, unreliable output. A model too big is fine but wasteful.

The skill is matching the model to the task. After a few weeks of using them, you develop intuitions: “this is an Opus job” vs “Haiku is plenty for this.”

The current lineup (June 2026 snapshot)

Claude Fable 5 — the new frontier

Released: June 2026 (Anthropic released Fable 5 and Mythos 5 on June 9, 2026; customer access was paused on June 12 — currently limited rollout)
Designed for: the absolute hardest reasoning, longest-horizon agent runs, frontier-benchmark workloads
Context window: 1M tokens, with up to 128K output
Pricing: ~ $10/$ 50 per million input/output tokens
When to use: complex multi-step coding, deep research, novel reasoning tasks. Not for routine work.

Claude Opus 4.8 — the high-capability daily driver

Released: May 28, 2026
Designed for: demanding reasoning, agent work, code generation, structured analysis
Context window: ~200K tokens
Pricing: ~ $5/$ 25 per million input/output tokens
When to use: Claude Code sessions where you want the best reasoning; production agents doing important work; tasks where one good answer is worth more than five medium ones.

Claude Sonnet 4.6 — the workhorse

Released: mid-2026
Designed for: the best price-to-performance ratio; “good enough for most things”
Context window: 1M tokens
Pricing: ~ $3/$ 15 per million input/output tokens
When to use: most production traffic. Customer-facing chat, RAG pipelines, code review, summarization. The default unless you have a specific reason to go bigger or smaller.

Claude Haiku 4.5 — fast and cheap

Released: October 2025
Designed for: high-volume, low-complexity tasks
Context window: ~200K tokens
Pricing: ~ $1/$ 5 per million input/output tokens
When to use: classification, simple extraction, first-pass triage, sub-tasks within an agent where speed matters. Often paired with a bigger model — Haiku for “is this question about X?”, then Opus for the actual answer.

How to choose between them (the heuristic)

Question	Answer
Will I use it interactively in a UI where latency matters?	Lean Haiku or Sonnet
Is it doing a hard one-shot task where quality is paramount?	Opus (or Fable if exceptional)
Is it running an agent that needs to plan over many steps?	Opus for planning; Sonnet/Haiku for routine sub-steps
Is it processing high volume (thousands/millions of calls)?	Haiku if quality holds; Sonnet if not
Is it doing creative writing where style matters?	Sonnet or Opus; test which voice you prefer
Is it summarizing or extracting from long documents?	Sonnet (1M context, cheap) is usually the sweet spot
Is it the brains of a critical production system?	Pick once based on testing; lock the version

A pattern that’s increasingly common: model routing. Send a simple input to Haiku; if Haiku’s confidence is low, escalate to Sonnet or Opus. Saves money without sacrificing the hard cases.

What “version 4.8” means

Anthropic uses model versions like claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5, claude-fable-5. The convention:

Family (Opus / Sonnet / Haiku / Fable) — the size/tier
Major version (3, 4, 5…) — significant capability jumps; backwards-incompatible changes possible
Minor version (.5, .6, .7, .8) — incremental improvements; usually backwards-compatible

Within a major version, newer minor versions are almost always better for the same price. Newer major versions cost more but are also more capable.

When you call the API, you specify the exact version: model: "claude-opus-4-8". This is important for production — using the un-versioned alias "claude-opus" means Anthropic might upgrade you to a newer model without warning, possibly changing behavior. Always pin to a specific version in production.

Cost optimization techniques

The single biggest production lever:

Prompt caching

Mark static parts of your prompt as cacheable. Subsequent requests reuse the cached prefix at 10% of normal cost.

For agents and RAG pipelines, this routinely saves 70–95% of input costs.

Batch processing

For non-real-time workloads, Anthropic’s batch API processes requests asynchronously at 50% off. Useful for nightly summaries, embeddings generation, large dataset processing.

Model routing

As above — Haiku for easy stuff, Opus for hard. Done right, costs go down without quality going down.

Right-sizing max_tokens

Don’t set max_tokens: 4096 for tasks that produce 50-token responses. The reservation doesn’t cost extra, but bigger budgets sometimes encourage the model to produce more output than needed.

Output compression

For programmatic outputs (JSON, structured data), the schema cost is paid once via tool use; subsequent outputs are minimal.

Caching results yourself

If many users are likely to ask similar things, cache the model’s response. The cheapest LLM call is the one you didn’t make.

Capabilities across the lineup

All current Claude models share most capabilities; size mostly affects depth and reliability:

Capability	Fable 5	Opus 4.8	Sonnet 4.6	Haiku 4.5
Tool use	✅	✅	✅	✅
Vision (image input)	✅	✅	✅	✅
Extended thinking	✅	✅	✅	Limited
Long-context retrieval	✅✅	✅	✅✅	✅
Code generation	✅✅✅	✅✅✅	✅✅	✅
Complex multi-step reasoning	✅✅✅	✅✅✅	✅✅	✅
Speed	Slowest	Slow-medium	Medium-fast	Fastest
Cost per token	Highest	High	Medium	Lowest

For the day-to-day question of “is X model good enough for Y task,” the answer is usually yes for Sonnet and almost-always-yes for Opus.

Other models you’ll hear about

GPT-4 family (OpenAI) — the main alternative. Different strengths (broader ecosystem integration, image generation built in). Comparable quality at most tasks.
Gemini (Google) — strong on long context (2M tokens), multimodal, deeply integrated with Google ecosystem.
Llama (Meta) — open-weight; competitive at smaller sizes; self-hostable.
Mistral, Qwen, DeepSeek — strong open-weight alternatives.

For most application work, the choice between major frontier providers (Anthropic, OpenAI, Google) is more about ecosystem fit than raw capability. Test the models against your actual use cases.

Common gotchas

Pin model versions in production. Don’t use generic aliases.
The same task can need different models at different stages. Use Opus for planning, Sonnet for execution, Haiku for trivial sub-tasks.
“Extended thinking” mode trades cost for quality. When you turn on extended thinking, the model uses extra tokens internally to reason before answering. Better outputs, higher cost. Use selectively.
Vision tokens are real. A high-resolution image can cost more than a paragraph of text. Resize images before sending if the high resolution isn’t needed.
Streaming doesn’t reduce cost (covered in tokens entry but worth restating) — it just improves perceived latency.
Latency varies by load. During peak hours, Anthropic’s API can be slower. Build retries and timeouts.
Rate limits scale with usage tier. New API keys have lower limits. Plan for graduation as you scale.
Newer != always better for your task. Newer minor versions almost always are; major versions occasionally shift in style that may not suit you. Test before migrating production.
“What’s the latest model?” is the wrong question. “What’s the best fit for my task and budget?” is the right one.

Tech & AI, Explained

Explorer

claude-models