πŸ‡ΊπŸ‡Έ USA Β· Meta Llama (open-weight model family)

Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: Meta’s family of open-weight LLMs β€” the most-downloaded open-source AI models in the world. Llama 4 / 5 power a huge slice of the open-source AI ecosystem; everyone from solo developers to big enterprises runs Llama somewhere.


Front-matter facts

FieldValue
VendorMeta Platforms Inc (Menlo Park, USA)
Country / originπŸ‡ΊπŸ‡Έ USA
Recommended for Australian users?βœ… Yes β€” open-weight, free download, runnable on Western infrastructure or locally
Privacy summaryLlama models themselves don’t transmit data anywhere β€” depends entirely on where you run them. Local = no data leaves; Western cloud (AWS / Azure / Together / Groq / Fireworks) = that cloud’s privacy posture
Free tierYes β€” weights are free to download under Llama license; commercial use restricted only for very large companies (>700M MAU)
Paid tiersNone for the weights themselves; inference costs depend on hosting choice
First releasedLlama 1 February 2023 (leaked then re-released); Llama 2 July 2023; Llama 3 April 2024; Llama 3.3 / 4 2024-25; Llama 5 2026
Last reviewed2026-06-26
Official sitehttps://llama.com

What it is

Llama is Meta’s family of open-weight large language models. Meta releases the model weights freely; anyone can download them and run them locally or on any cloud. This makes Llama the foundation of much of the open-source AI ecosystem β€” countless fine-tunes, specialised variants, and downstream products are built on Llama.

Llama model lineup (typical):

  • Llama 4 / 5 8B β€” small, fast, runs on consumer GPUs / Apple Silicon
  • Llama 4 / 5 70B β€” medium-large, runs on workstation GPUs
  • Llama 4 / 5 405B+ β€” frontier-scale, requires multi-GPU
  • Llama Code β€” coding-specialised variants
  • Llama Guard β€” content-moderation model
  • Llama Vision β€” multimodal

Why Llama matters:

  • Largest open-weight ecosystem β€” most-downloaded models, most fine-tunes, biggest community
  • Western-jurisdiction open weights β€” alternative to Chinese open-weights (DeepSeek, Qwen) without political-filtering concerns
  • Self-hostable β€” run on your own hardware or any Western cloud
  • License is permissive for most users β€” only restricted for very large companies (>700M MAU companies like Google / Microsoft / TikTok)
  • Powers Meta’s own products (Meta AI in WhatsApp, Instagram, Facebook)

What you’d use it for

  • Self-hosted AI β€” run Llama on your own machine or your own cloud
  • No-data-to-frontier-providers β€” keep all queries on infrastructure you control
  • Cost-optimisation β€” run smaller Llama variants cheaper than frontier APIs
  • Customisation β€” fine-tune Llama on your own data
  • Privacy-sensitive workloads β€” Western open weights for confidential code / content
  • As foundation for fine-tuning β€” most research / specialised models start with Llama base
  • Western alternative to Chinese open-weight models β€” gets you self-hosting flexibility without political-filtering training

How to use from Australia

Run locally on your PC / Mac

  1. Install Ollama (ollama.com) β€” easiest path
  2. ollama pull llama4:8b β€” downloads the model
  3. ollama run llama4:8b β€” chat in terminal
  4. Requires: decent GPU (RTX 3060+ for 8B; RTX 4090+ for 70B) OR Apple Silicon Mac (M3 / M4 with 16GB+ RAM for 8B; 32GB+ for 70B)

Via Western cloud inference providers

  • Together AI (together.ai) β€” cheap, fast Llama inference
  • Fireworks AI (fireworks.ai) β€” fast Llama inference
  • Groq (groq.com) β€” extremely fast Llama inference on LPU chips
  • AWS Bedrock β€” Llama via AWS (AUS Sydney region)
  • Azure AI Foundry β€” Llama via Azure
  • Replicate β€” pay-per-token Llama

Via Meta AI directly (consumer product)

  • WhatsApp / Instagram / Facebook / meta.ai β€” uses Llama under the hood
  • See Meta AI 🟩 🟦 entry for the consumer product

What it costs

Llama weights themselves: free

  • Apache-2.0-style license with the >700M MAU restriction
  • Download from llama.com, Hugging Face, or any partner

Self-hosted on your own hardware

  • Hardware cost + electricity
  • For RTX-equipped PCs: typically free incremental cost
  • For M-series Macs: typically free incremental cost (uses Mac’s NPU/GPU)

Western cloud inference

  • Together AI: ~US0.30 per million tokens for Llama 70B
  • Fireworks AI: similar
  • Groq: very fast, ~US$0.60 per million tokens for Llama 70B (priced for speed premium)
  • AWS Bedrock: ~US0.45 per million tokens for Llama 4 70B
  • Generally cheaper than frontier Anthropic Opus / OpenAI GPT-5

How it compares to alternatives

CapabilityLlama 5 70BClaude SonnetGPT-5 miniMistral LargeQwen (β›” China)
Open weightsYes (Apache-style)NoNo (gpt-oss separate)Yes (Apache 2.0)Yes (politically-filtered training)
Cost (cheap tier via Western hosts)CheapModerateModerateCheap-moderateCheap (but avoid direct API)
Self-hostableYesNoNoYesYes (but avoid for politically-filtered training)
Frontier capabilityStrong (especially 405B+)FrontierStrongStrongStrong (but trust concerns)
Western jurisdictionYes (USA)Yes (USA)Yes (USA)Yes (France)China β›”
MultilingualStrongStrongStrongBest EuropeanStrong (but trust concerns)
CodingLlama Code strongStrongStrongCodestral specialistDeepSeek-Coder strong but β›”
Community / fine-tunesLargestLimitedLimitedStrongStrong

For Western open-weight foundation, Llama and Mistral are the two main choices. Llama has the largest community + fine-tune ecosystem; Mistral has the most-permissive license and stronger European-language performance.


Privacy / data handling

Llama itself doesn’t transmit data anywhere β€” it’s just a model file. Privacy posture depends entirely on where you run it:

  • Local (Ollama / LM Studio) = no data leaves your machine
  • Western cloud inference (Together / Fireworks / Groq / AWS / Azure) = that cloud’s privacy terms apply (all major Western hosts have no-training-by-default for inference)
  • Meta AI consumer (meta.ai / WhatsApp) = Meta’s consumer privacy posture (see Meta AI)

Llama license is permissive for most uses; verify for your specific case if you’re a large enterprise (>700M MAU).


Recent changes

  • 2026: Llama 5 family released; quality leap
  • 2025: Llama 4 series; multimodal Llama Vision
  • 2024: Llama 3 / 3.1 / 3.3 β€” Llama 3.1 405B closed gap to frontier-closed models
  • 2023: Llama 2 first commercially-friendly Llama; Llama 1 originally research-only then leaked
  • February 2023: Llama 1 announced

Gotchas

  • License restrictions β€” Apache-style with the >700M-MAU exception. Most users / companies fine; check if you’re at scale.
  • β€œLlama” is also the model name AND the project name β€” context matters
  • Frontier-scale Llama (405B+) needs serious hardware β€” multi-GPU setup, not feasible on a laptop
  • Smaller Llama (8B / 70B) runs well on consumer hardware β€” pair with Ollama / LM Studio
  • Code-tuned variants (Llama Code) are separate from base Llama β€” check which you’re using for coding tasks
  • Llama Vision is newer than text variants; verify capability for your use case
  • Fine-tuning Llama on your data is the standard pattern for custom AI β€” Western alternative to fine-tuning closed Anthropic / OpenAI
  • Llama Guard is the moderation companion model β€” use alongside Llama for safer deployments
  • For Aussie privacy-sensitive use, self-hosted Llama is one of the strongest options β€” no data leaves your network at all

See also


Sources