πΊπΈ USA Β· Meta Llama (open-weight model family)
Status: π© COMPLETE π¦ LIVING Last updated: 2026-06-26 Plain-English tagline: Metaβs family of open-weight LLMs β the most-downloaded open-source AI models in the world. Llama 4 / 5 power a huge slice of the open-source AI ecosystem; everyone from solo developers to big enterprises runs Llama somewhere.
Front-matter facts
| Field | Value |
|---|---|
| Vendor | Meta Platforms Inc (Menlo Park, USA) |
| Country / origin | πΊπΈ USA |
| Recommended for Australian users? | β Yes β open-weight, free download, runnable on Western infrastructure or locally |
| Privacy summary | Llama models themselves donβt transmit data anywhere β depends entirely on where you run them. Local = no data leaves; Western cloud (AWS / Azure / Together / Groq / Fireworks) = that cloudβs privacy posture |
| Free tier | Yes β weights are free to download under Llama license; commercial use restricted only for very large companies (>700M MAU) |
| Paid tiers | None for the weights themselves; inference costs depend on hosting choice |
| First released | Llama 1 February 2023 (leaked then re-released); Llama 2 July 2023; Llama 3 April 2024; Llama 3.3 / 4 2024-25; Llama 5 2026 |
| Last reviewed | 2026-06-26 |
| Official site | https://llama.com |
What it is
Llama is Metaβs family of open-weight large language models. Meta releases the model weights freely; anyone can download them and run them locally or on any cloud. This makes Llama the foundation of much of the open-source AI ecosystem β countless fine-tunes, specialised variants, and downstream products are built on Llama.
Llama model lineup (typical):
- Llama 4 / 5 8B β small, fast, runs on consumer GPUs / Apple Silicon
- Llama 4 / 5 70B β medium-large, runs on workstation GPUs
- Llama 4 / 5 405B+ β frontier-scale, requires multi-GPU
- Llama Code β coding-specialised variants
- Llama Guard β content-moderation model
- Llama Vision β multimodal
Why Llama matters:
- Largest open-weight ecosystem β most-downloaded models, most fine-tunes, biggest community
- Western-jurisdiction open weights β alternative to Chinese open-weights (DeepSeek, Qwen) without political-filtering concerns
- Self-hostable β run on your own hardware or any Western cloud
- License is permissive for most users β only restricted for very large companies (>700M MAU companies like Google / Microsoft / TikTok)
- Powers Metaβs own products (Meta AI in WhatsApp, Instagram, Facebook)
What youβd use it for
- Self-hosted AI β run Llama on your own machine or your own cloud
- No-data-to-frontier-providers β keep all queries on infrastructure you control
- Cost-optimisation β run smaller Llama variants cheaper than frontier APIs
- Customisation β fine-tune Llama on your own data
- Privacy-sensitive workloads β Western open weights for confidential code / content
- As foundation for fine-tuning β most research / specialised models start with Llama base
- Western alternative to Chinese open-weight models β gets you self-hosting flexibility without political-filtering training
How to use from Australia
Run locally on your PC / Mac
- Install Ollama (ollama.com) β easiest path
ollama pull llama4:8bβ downloads the modelollama run llama4:8bβ chat in terminal- Requires: decent GPU (RTX 3060+ for 8B; RTX 4090+ for 70B) OR Apple Silicon Mac (M3 / M4 with 16GB+ RAM for 8B; 32GB+ for 70B)
Via Western cloud inference providers
- Together AI (together.ai) β cheap, fast Llama inference
- Fireworks AI (fireworks.ai) β fast Llama inference
- Groq (groq.com) β extremely fast Llama inference on LPU chips
- AWS Bedrock β Llama via AWS (AUS Sydney region)
- Azure AI Foundry β Llama via Azure
- Replicate β pay-per-token Llama
Via Meta AI directly (consumer product)
- WhatsApp / Instagram / Facebook / meta.ai β uses Llama under the hood
- See Meta AI π© π¦ entry for the consumer product
What it costs
Llama weights themselves: free
- Apache-2.0-style license with the >700M MAU restriction
- Download from llama.com, Hugging Face, or any partner
Self-hosted on your own hardware
- Hardware cost + electricity
- For RTX-equipped PCs: typically free incremental cost
- For M-series Macs: typically free incremental cost (uses Macβs NPU/GPU)
Western cloud inference
- Together AI: ~US0.30 per million tokens for Llama 70B
- Fireworks AI: similar
- Groq: very fast, ~US$0.60 per million tokens for Llama 70B (priced for speed premium)
- AWS Bedrock: ~US0.45 per million tokens for Llama 4 70B
- Generally cheaper than frontier Anthropic Opus / OpenAI GPT-5
How it compares to alternatives
| Capability | Llama 5 70B | Claude Sonnet | GPT-5 mini | Mistral Large | Qwen (β China) |
|---|---|---|---|---|---|
| Open weights | Yes (Apache-style) | No | No (gpt-oss separate) | Yes (Apache 2.0) | Yes (politically-filtered training) |
| Cost (cheap tier via Western hosts) | Cheap | Moderate | Moderate | Cheap-moderate | Cheap (but avoid direct API) |
| Self-hostable | Yes | No | No | Yes | Yes (but avoid for politically-filtered training) |
| Frontier capability | Strong (especially 405B+) | Frontier | Strong | Strong | Strong (but trust concerns) |
| Western jurisdiction | Yes (USA) | Yes (USA) | Yes (USA) | Yes (France) | China β |
| Multilingual | Strong | Strong | Strong | Best European | Strong (but trust concerns) |
| Coding | Llama Code strong | Strong | Strong | Codestral specialist | DeepSeek-Coder strong but β |
| Community / fine-tunes | Largest | Limited | Limited | Strong | Strong |
For Western open-weight foundation, Llama and Mistral are the two main choices. Llama has the largest community + fine-tune ecosystem; Mistral has the most-permissive license and stronger European-language performance.
Privacy / data handling
Llama itself doesnβt transmit data anywhere β itβs just a model file. Privacy posture depends entirely on where you run it:
- Local (Ollama / LM Studio) = no data leaves your machine
- Western cloud inference (Together / Fireworks / Groq / AWS / Azure) = that cloudβs privacy terms apply (all major Western hosts have no-training-by-default for inference)
- Meta AI consumer (meta.ai / WhatsApp) = Metaβs consumer privacy posture (see Meta AI)
Llama license is permissive for most uses; verify for your specific case if youβre a large enterprise (>700M MAU).
Recent changes
- 2026: Llama 5 family released; quality leap
- 2025: Llama 4 series; multimodal Llama Vision
- 2024: Llama 3 / 3.1 / 3.3 β Llama 3.1 405B closed gap to frontier-closed models
- 2023: Llama 2 first commercially-friendly Llama; Llama 1 originally research-only then leaked
- February 2023: Llama 1 announced
Gotchas
- License restrictions β Apache-style with the >700M-MAU exception. Most users / companies fine; check if youβre at scale.
- βLlamaβ is also the model name AND the project name β context matters
- Frontier-scale Llama (405B+) needs serious hardware β multi-GPU setup, not feasible on a laptop
- Smaller Llama (8B / 70B) runs well on consumer hardware β pair with Ollama / LM Studio
- Code-tuned variants (Llama Code) are separate from base Llama β check which youβre using for coding tasks
- Llama Vision is newer than text variants; verify capability for your use case
- Fine-tuning Llama on your data is the standard pattern for custom AI β Western alternative to fine-tuning closed Anthropic / OpenAI
- Llama Guard is the moderation companion model β use alongside Llama for safer deployments
- For Aussie privacy-sensitive use, self-hosted Llama is one of the strongest options β no data leaves your network at all
See also
- Meta AI π© π¦ β consumer product powered by Llama
- Mistral models π₯ β European open-weight alternative
- Mistral Codestral π© π¦
- Gemma (Google open weights) π₯
- OpenAI gpt-oss (open weights) π₯
- Granite (IBM open weights) π₯
- Cohere π© π¦
- DeepSeek (β Chinese β discussed for contrast) π© π¦
- Qwen (β Chinese β discussed for contrast) π© π¦
- Ollama π₯ β run Llama locally
- LM Studio π₯
- Together AI π₯
- Fireworks AI π₯
- Groq π₯
- AWS Bedrock π₯
- open-weights-vs-closed.md π₯
- Glossary β L (Llama) π©