🇺🇸 USA · Meta Llama (open-weight model family)

Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: Meta’s family of open-weight LLMs — the most-downloaded open-source AI models in the world. Llama 4 / 5 power a huge slice of the open-source AI ecosystem; everyone from solo developers to big enterprises runs Llama somewhere.

Front-matter facts

Field	Value
Vendor	Meta Platforms Inc (Menlo Park, USA)
Country / origin	🇺🇸 USA
Recommended for Australian users?	✅ Yes — open-weight, free download, runnable on Western infrastructure or locally
Privacy summary	Llama models themselves don’t transmit data anywhere — depends entirely on where you run them. Local = no data leaves; Western cloud (AWS / Azure / Together / Groq / Fireworks) = that cloud’s privacy posture
Free tier	Yes — weights are free to download under Llama license; commercial use restricted only for very large companies (>700M MAU)
Paid tiers	None for the weights themselves; inference costs depend on hosting choice
First released	Llama 1 February 2023 (leaked then re-released); Llama 2 July 2023; Llama 3 April 2024; Llama 3.3 / 4 2024-25; Llama 5 2026
Last reviewed	2026-06-26
Official site	https://llama.com

What it is

Llama is Meta’s family of open-weight large language models. Meta releases the model weights freely; anyone can download them and run them locally or on any cloud. This makes Llama the foundation of much of the open-source AI ecosystem — countless fine-tunes, specialised variants, and downstream products are built on Llama.

Llama model lineup (typical):

Llama 4 / 5 8B — small, fast, runs on consumer GPUs / Apple Silicon
Llama 4 / 5 70B — medium-large, runs on workstation GPUs
Llama 4 / 5 405B+ — frontier-scale, requires multi-GPU
Llama Code — coding-specialised variants
Llama Guard — content-moderation model
Llama Vision — multimodal

Why Llama matters:

Largest open-weight ecosystem — most-downloaded models, most fine-tunes, biggest community
Western-jurisdiction open weights — alternative to Chinese open-weights (DeepSeek, Qwen) without political-filtering concerns
Self-hostable — run on your own hardware or any Western cloud
License is permissive for most users — only restricted for very large companies (>700M MAU companies like Google / Microsoft / TikTok)
Powers Meta’s own products (Meta AI in WhatsApp, Instagram, Facebook)

What you’d use it for

Self-hosted AI — run Llama on your own machine or your own cloud
No-data-to-frontier-providers — keep all queries on infrastructure you control
Cost-optimisation — run smaller Llama variants cheaper than frontier APIs
Customisation — fine-tune Llama on your own data
Privacy-sensitive workloads — Western open weights for confidential code / content
As foundation for fine-tuning — most research / specialised models start with Llama base
Western alternative to Chinese open-weight models — gets you self-hosting flexibility without political-filtering training

How to use from Australia

Run locally on your PC / Mac

Install Ollama (ollama.com) — easiest path
ollama pull llama4:8b — downloads the model
ollama run llama4:8b — chat in terminal
Requires: decent GPU (RTX 3060+ for 8B; RTX 4090+ for 70B) OR Apple Silicon Mac (M3 / M4 with 16GB+ RAM for 8B; 32GB+ for 70B)

Via Western cloud inference providers

Together AI (together.ai) — cheap, fast Llama inference
Fireworks AI (fireworks.ai) — fast Llama inference
Groq (groq.com) — extremely fast Llama inference on LPU chips
AWS Bedrock — Llama via AWS (AUS Sydney region)
Azure AI Foundry — Llama via Azure
Replicate — pay-per-token Llama

Via Meta AI directly (consumer product)

WhatsApp / Instagram / Facebook / meta.ai — uses Llama under the hood
See Meta AI 🟩 🟦 entry for the consumer product

What it costs

Llama weights themselves: free

Apache-2.0-style license with the >700M MAU restriction
Download from llama.com, Hugging Face, or any partner

Self-hosted on your own hardware

Hardware cost + electricity
For RTX-equipped PCs: typically free incremental cost
For M-series Macs: typically free incremental cost (uses Mac’s NPU/GPU)

Western cloud inference

Together AI: ~US $0.20/$ 0.30 per million tokens for Llama 70B
Fireworks AI: similar
Groq: very fast, ~US$0.60 per million tokens for Llama 70B (priced for speed premium)
AWS Bedrock: ~US $0.30/$ 0.45 per million tokens for Llama 4 70B
Generally cheaper than frontier Anthropic Opus / OpenAI GPT-5

How it compares to alternatives

Capability	Llama 5 70B	Claude Sonnet	GPT-5 mini	Mistral Large	Qwen (⛔ China)
Open weights	Yes (Apache-style)	No	No (gpt-oss separate)	Yes (Apache 2.0)	Yes (politically-filtered training)
Cost (cheap tier via Western hosts)	Cheap	Moderate	Moderate	Cheap-moderate	Cheap (but avoid direct API)
Self-hostable	Yes	No	No	Yes	Yes (but avoid for politically-filtered training)
Frontier capability	Strong (especially 405B+)	Frontier	Strong	Strong	Strong (but trust concerns)
Western jurisdiction	Yes (USA)	Yes (USA)	Yes (USA)	Yes (France)	China ⛔
Multilingual	Strong	Strong	Strong	Best European	Strong (but trust concerns)
Coding	Llama Code strong	Strong	Strong	Codestral specialist	DeepSeek-Coder strong but ⛔
Community / fine-tunes	Largest	Limited	Limited	Strong	Strong

For Western open-weight foundation, Llama and Mistral are the two main choices. Llama has the largest community + fine-tune ecosystem; Mistral has the most-permissive license and stronger European-language performance.

Privacy / data handling

Llama itself doesn’t transmit data anywhere — it’s just a model file. Privacy posture depends entirely on where you run it:

Local (Ollama / LM Studio) = no data leaves your machine
Western cloud inference (Together / Fireworks / Groq / AWS / Azure) = that cloud’s privacy terms apply (all major Western hosts have no-training-by-default for inference)
Meta AI consumer (meta.ai / WhatsApp) = Meta’s consumer privacy posture (see Meta AI)

Llama license is permissive for most uses; verify for your specific case if you’re a large enterprise (>700M MAU).

Recent changes

2026: Llama 5 family released; quality leap
2025: Llama 4 series; multimodal Llama Vision
2024: Llama 3 / 3.1 / 3.3 — Llama 3.1 405B closed gap to frontier-closed models
2023: Llama 2 first commercially-friendly Llama; Llama 1 originally research-only then leaked
February 2023: Llama 1 announced

Gotchas

License restrictions — Apache-style with the >700M-MAU exception. Most users / companies fine; check if you’re at scale.
“Llama” is also the model name AND the project name — context matters
Frontier-scale Llama (405B+) needs serious hardware — multi-GPU setup, not feasible on a laptop
Smaller Llama (8B / 70B) runs well on consumer hardware — pair with Ollama / LM Studio
Code-tuned variants (Llama Code) are separate from base Llama — check which you’re using for coding tasks
Llama Vision is newer than text variants; verify capability for your use case
Fine-tuning Llama on your data is the standard pattern for custom AI — Western alternative to fine-tuning closed Anthropic / OpenAI
Llama Guard is the moderation companion model — use alongside Llama for safer deployments
For Aussie privacy-sensitive use, self-hosted Llama is one of the strongest options — no data leaves your network at all

Tech & AI, Explained

Explorer

llama