🇺🇸 United States · Google Gemma — Open-Weights Small Models

Status: 🟩 COMPLETE 🟦 LIVING Section: 10 — AI and LLMs

VendorGoogle DeepMind
Country/origin🇺🇸 United States / 🇬🇧 United Kingdom (Google DeepMind)
Recommended for AUS?✅ Yes — open-weights; can run locally (max privacy); permissive licence
Privacy summaryWhen run locally: no data sent anywhere. When used via Google Cloud / Vertex AI: standard Google enterprise data handling
Free tier✅ Completely free — open-weights download; only compute costs
Paid tiersFree model; paid only if using via cloud API (Google Cloud, Hugging Face Inference, etc.)
First releasedGemma 1: February 2024; Gemma 2: June 2024; Gemma 3: 2025; ongoing
Last reviewedJune 2026
Official sitehttps://ai.google.dev/gemma

What it is

Gemma is Google’s family of open-weights language models — small, efficient AI models that Google releases publicly so anyone can download, run locally, fine-tune, and build products with them. Gemma is Google’s answer to Meta’s Llama in the open-weights AI race.

“Open-weights” means: the trained model file is downloadable. You can run it on your own laptop (if powerful enough), on your own servers, or in any cloud. You don’t have to use Google’s API or send data to Google. See open-weights-vs-closed for the full distinction.

The Gemma family (mid-2026):

ModelParametersBest for
Gemma 2 2B2 billionPhones, edge devices, very fast inference
Gemma 2 9B9 billionLaptops with decent GPU; production inference
Gemma 2 27B27 billionWorkstations; near-frontier quality at smaller size
Gemma 3 4B / 12B / 27BVariousLatest generation; multimodal (vision); 128K context
CodeGemma2B / 7BCode completion and generation
RecurrentGemma2B / 9BLong contexts; recurrent architecture
PaliGemma3BVision + language tasks
MedGemma27BHealthcare research (medical text and images)
ShieldGemma2B / 9BContent moderation classifier

Why Gemma matters

Gemma models are notable for several reasons:

  1. High quality at small size: Gemma 2 9B punches well above its weight class, performing competitively with much larger models on many benchmarks. This makes it practical for local deployment.

  2. Strong multilingual support: Gemma 3 supports 140+ languages — much broader coverage than many open-weights models.

  3. Permissive licence: Gemma is released under a Google licence that allows commercial use with some restrictions (significantly more permissive than some alternatives).

  4. Multimodal (Gemma 3): Vision capabilities added in 2025; can process images alongside text.

  5. Strong tooling integration: Excellent support in Hugging Face, Ollama, llama.cpp, and Google’s own AI tooling.


How to use Gemma (Australian users)

Run locally (private, free)

The most common way for personal use:

  1. Install Ollama (https://ollama.com) — the easiest tool for running open-weights models locally
  2. In a terminal: ollama pull gemma2:9b (or gemma2:2b for less powerful machines)
  3. Run: ollama run gemma2:9b and chat away

Hardware requirements:

  • Gemma 2 2B: Runs on any laptop with 8GB+ RAM
  • Gemma 2 9B: Comfortable with 16GB+ RAM (Apple Silicon excellent here)
  • Gemma 2 27B: Requires 32GB+ RAM or dedicated GPU with 24GB+ VRAM

Via Hugging Face

  1. Go to https://huggingface.co/google → search for Gemma
  2. Accept the licence (requires Hugging Face account)
  3. Download the model files or use Hugging Face Inference API

Via Google Cloud Vertex AI

For enterprise or production use, Gemma is available on Vertex AI with managed hosting.

In LM Studio or Jan.ai

GUI-based tools for running models locally, including Gemma. Easier for non-technical users than command-line Ollama.


How Gemma compares to other open-weights models

ModelSizeQualityMultimodalLicenceBest for
Gemma 3 27B27BVery good✅ (vision)Google permissiveMultilingual; vision
Llama 3.3 70B70BExcellentMeta community licenceLarger; broader knowledge
Mistral Small 3.124BVery good✅ (vision)Apache 2.0 (some)EU origin; efficient
Phi-4 (Microsoft)14BExcellentMITTiny but capable
Qwen 2.5VariousExcellentApache 2.0Chinese; avoid
DeepSeek R2VariousExcellentOpenChinese; avoid

For most Australian use cases needing an open-weights model:

  • Gemma 2 9B for a laptop-friendly capable model
  • Llama 3.3 70B if you have heavy hardware and need broader knowledge
  • Phi-4 for the smallest footprint with excellent quality

Gemma’s specialised variants

Beyond the core Gemma models, Google has released task-specific variants:

CodeGemma

Specialised for code generation, completion, and explanation. Smaller than general models but optimised specifically for programming tasks.

PaliGemma

Vision-language model. Takes images and answers questions about them. Useful for document understanding, image captioning, and visual reasoning.

MedGemma

Trained on medical literature and able to process medical text and images. Designed for healthcare research; not for direct patient care use.

ShieldGemma

A content moderation classifier. Helps developers filter unsafe content from AI applications.

RecurrentGemma

Uses a different architecture (recurrent rather than attention-only) that enables longer context windows with constant memory use.


Licence considerations

Gemma is released under the Google Gemma Terms of Use — not standard open-source. Key points:

  • ✅ Commercial use allowed
  • ✅ Modification and fine-tuning allowed
  • ✅ Distribution of modified versions allowed
  • ⚠️ Required to comply with Google’s Prohibited Use Policy
  • ⚠️ Required to provide notice that you used Gemma
  • ⚠️ Some downstream uses (military applications, etc.) prohibited

The licence is more permissive than Llama’s community licence in some respects (no 100M user threshold) but more restrictive in others (specific use prohibitions).

For commercial use of Gemma in Australian products, review the current Gemma terms at ai.google.dev/gemma/terms.


Gotchas

  • Smaller models have real capability limits. Gemma 2B and 9B are useful for many tasks but will fall short of Claude 3.5 Sonnet or GPT-4o on complex reasoning, very long documents, or specialised knowledge. Match model size to task.
  • Open-weights licences ≠ MIT/Apache 2.0. The Gemma terms are more restrictive than standard open-source licences. Read them before commercial deployment.
  • Running local models requires hardware. A modern MacBook handles Gemma 9B fine; older laptops or low-RAM machines struggle. Verify your hardware can run the size you want.
  • Performance optimisation matters. Different inference engines (Ollama, llama.cpp, vLLM, MLX on Mac) have very different performance characteristics. Apple Silicon users should consider MLX for best performance.
  • Multilingual quality varies. While Gemma 3 supports 140+ languages, quality is best in English, then major European/Asian languages. Quality for low-resource languages including most Australian Indigenous languages is limited.
  • Not a frontier model. Gemma is excellent for an open-weights model of its size, but it’s not competing with the latest Claude/GPT-4o/Gemini Ultra on overall capability. Use frontier APIs when you need maximum capability.

Use cases where Gemma shines

  • Privacy-sensitive applications: Medical notes, legal documents, personal journaling — process locally without sending data anywhere
  • Offline applications: Field work, remote areas, embedded systems
  • High-volume applications: Where API costs would be prohibitive
  • Custom fine-tuned models: Take Gemma and fine-tune for your specific domain
  • Education and research: Free to use; well-documented for learning
  • Edge deployment: Phones, embedded devices, IoT — Gemma 2B works on constrained hardware

See also


Sources

  • Google Gemma documentation: ai.google.dev/gemma
  • Gemma 1 technical paper (2024)
  • Gemma 2 technical report (June 2024)
  • Gemma 3 model card and release notes (2025)
  • Gemma Terms of Use: ai.google.dev/gemma/terms
  • Hugging Face Gemma model cards
  • Independent benchmarks: ArtificialAnalysis.ai (2024–2026)