🇺🇸 United States · Google Gemma — Open-Weights Small Models

Status: 🟩 COMPLETE 🟦 LIVING Section: 10 — AI and LLMs


Vendor	Google DeepMind
Country/origin	🇺🇸 United States / 🇬🇧 United Kingdom (Google DeepMind)
Recommended for AUS?	✅ Yes — open-weights; can run locally (max privacy); permissive licence
Privacy summary	When run locally: no data sent anywhere. When used via Google Cloud / Vertex AI: standard Google enterprise data handling
Free tier	✅ Completely free — open-weights download; only compute costs
Paid tiers	Free model; paid only if using via cloud API (Google Cloud, Hugging Face Inference, etc.)
First released	Gemma 1: February 2024; Gemma 2: June 2024; Gemma 3: 2025; ongoing
Last reviewed	June 2026
Official site	https://ai.google.dev/gemma

What it is

Gemma is Google’s family of open-weights language models — small, efficient AI models that Google releases publicly so anyone can download, run locally, fine-tune, and build products with them. Gemma is Google’s answer to Meta’s Llama in the open-weights AI race.

“Open-weights” means: the trained model file is downloadable. You can run it on your own laptop (if powerful enough), on your own servers, or in any cloud. You don’t have to use Google’s API or send data to Google. See open-weights-vs-closed for the full distinction.

The Gemma family (mid-2026):

Model	Parameters	Best for
Gemma 2 2B	2 billion	Phones, edge devices, very fast inference
Gemma 2 9B	9 billion	Laptops with decent GPU; production inference
Gemma 2 27B	27 billion	Workstations; near-frontier quality at smaller size
Gemma 3 4B / 12B / 27B	Various	Latest generation; multimodal (vision); 128K context
CodeGemma	2B / 7B	Code completion and generation
RecurrentGemma	2B / 9B	Long contexts; recurrent architecture
PaliGemma	3B	Vision + language tasks
MedGemma	27B	Healthcare research (medical text and images)
ShieldGemma	2B / 9B	Content moderation classifier

Why Gemma matters

Gemma models are notable for several reasons:

High quality at small size: Gemma 2 9B punches well above its weight class, performing competitively with much larger models on many benchmarks. This makes it practical for local deployment.
Strong multilingual support: Gemma 3 supports 140+ languages — much broader coverage than many open-weights models.
Permissive licence: Gemma is released under a Google licence that allows commercial use with some restrictions (significantly more permissive than some alternatives).
Multimodal (Gemma 3): Vision capabilities added in 2025; can process images alongside text.
Strong tooling integration: Excellent support in Hugging Face, Ollama, llama.cpp, and Google’s own AI tooling.

How to use Gemma (Australian users)

Run locally (private, free)

The most common way for personal use:

Install Ollama (https://ollama.com) — the easiest tool for running open-weights models locally
In a terminal: ollama pull gemma2:9b (or gemma2:2b for less powerful machines)
Run: ollama run gemma2:9b and chat away

Hardware requirements:

Gemma 2 2B: Runs on any laptop with 8GB+ RAM
Gemma 2 9B: Comfortable with 16GB+ RAM (Apple Silicon excellent here)
Gemma 2 27B: Requires 32GB+ RAM or dedicated GPU with 24GB+ VRAM

Via Hugging Face

Go to https://huggingface.co/google → search for Gemma
Accept the licence (requires Hugging Face account)
Download the model files or use Hugging Face Inference API

Via Google Cloud Vertex AI

For enterprise or production use, Gemma is available on Vertex AI with managed hosting.

In LM Studio or Jan.ai

GUI-based tools for running models locally, including Gemma. Easier for non-technical users than command-line Ollama.

How Gemma compares to other open-weights models

Model	Size	Quality	Multimodal	Licence	Best for
Gemma 3 27B	27B	Very good	✅ (vision)	Google permissive	Multilingual; vision
Llama 3.3 70B	70B	Excellent	❌	Meta community licence	Larger; broader knowledge
Mistral Small 3.1	24B	Very good	✅ (vision)	Apache 2.0 (some)	EU origin; efficient
Phi-4 (Microsoft)	14B	Excellent	❌	MIT	Tiny but capable
Qwen 2.5 ⛔	Various	Excellent	✅	Apache 2.0	Chinese; avoid
DeepSeek R2 ⛔	Various	Excellent	❌	Open	Chinese; avoid

For most Australian use cases needing an open-weights model:

Gemma 2 9B for a laptop-friendly capable model
Llama 3.3 70B if you have heavy hardware and need broader knowledge
Phi-4 for the smallest footprint with excellent quality

Gemma’s specialised variants

Beyond the core Gemma models, Google has released task-specific variants:

CodeGemma

Specialised for code generation, completion, and explanation. Smaller than general models but optimised specifically for programming tasks.

PaliGemma

Vision-language model. Takes images and answers questions about them. Useful for document understanding, image captioning, and visual reasoning.

MedGemma

Trained on medical literature and able to process medical text and images. Designed for healthcare research; not for direct patient care use.

ShieldGemma

A content moderation classifier. Helps developers filter unsafe content from AI applications.

RecurrentGemma

Uses a different architecture (recurrent rather than attention-only) that enables longer context windows with constant memory use.

Licence considerations

Gemma is released under the Google Gemma Terms of Use — not standard open-source. Key points:

✅ Commercial use allowed
✅ Modification and fine-tuning allowed
✅ Distribution of modified versions allowed
⚠️ Required to comply with Google’s Prohibited Use Policy
⚠️ Required to provide notice that you used Gemma
⚠️ Some downstream uses (military applications, etc.) prohibited

The licence is more permissive than Llama’s community licence in some respects (no 100M user threshold) but more restrictive in others (specific use prohibitions).

For commercial use of Gemma in Australian products, review the current Gemma terms at ai.google.dev/gemma/terms.

Gotchas

Smaller models have real capability limits. Gemma 2B and 9B are useful for many tasks but will fall short of Claude 3.5 Sonnet or GPT-4o on complex reasoning, very long documents, or specialised knowledge. Match model size to task.
Open-weights licences ≠ MIT/Apache 2.0. The Gemma terms are more restrictive than standard open-source licences. Read them before commercial deployment.
Running local models requires hardware. A modern MacBook handles Gemma 9B fine; older laptops or low-RAM machines struggle. Verify your hardware can run the size you want.
Performance optimisation matters. Different inference engines (Ollama, llama.cpp, vLLM, MLX on Mac) have very different performance characteristics. Apple Silicon users should consider MLX for best performance.
Multilingual quality varies. While Gemma 3 supports 140+ languages, quality is best in English, then major European/Asian languages. Quality for low-resource languages including most Australian Indigenous languages is limited.
Not a frontier model. Gemma is excellent for an open-weights model of its size, but it’s not competing with the latest Claude/GPT-4o/Gemini Ultra on overall capability. Use frontier APIs when you need maximum capability.

Use cases where Gemma shines

Privacy-sensitive applications: Medical notes, legal documents, personal journaling — process locally without sending data anywhere
Offline applications: Field work, remote areas, embedded systems
High-volume applications: Where API costs would be prohibitive
Custom fine-tuned models: Take Gemma and fine-tune for your specific domain
Education and research: Free to use; well-documented for learning
Edge deployment: Phones, embedded devices, IoT — Gemma 2B works on constrained hardware

Sources

Google Gemma documentation: ai.google.dev/gemma
Gemma 1 technical paper (2024)
Gemma 2 technical report (June 2024)
Gemma 3 model card and release notes (2025)
Gemma Terms of Use: ai.google.dev/gemma/terms
Hugging Face Gemma model cards
Independent benchmarks: ArtificialAnalysis.ai (2024–2026)

Tech & AI, Explained

Explorer

gemma