🇺🇸 USA · Groq

Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: The fastest AI inference on the planet. Custom “LPU” (Language Processing Unit) chips deliver Llama / Mistral / Whisper responses 5-10× faster than Nvidia-based competitors. The choice when latency matters more than anything else.

Front-matter facts

Field	Value
Vendor	Groq Inc (Mountain View, USA) — founded 2016 by Jonathan Ross (ex-Google TPU lead)
Country / origin	🇺🇸 USA
Recommended for Australian users?	✅ Yes — fully accessible from AUS; US-based infrastructure
Privacy summary	API: no training on customer data; standard developer terms
Free tier	Yes — generous free tier (rate-limited but useful for dev)
Paid tiers	Pay-per-token; pre-paid credits; Enterprise quoted
First released	API generally available 2024 (Groq Inc founded 2016)
Last reviewed	2026-06-26
Official site	https://groq.com

What it is

Groq runs AI inference on custom-designed LPU (Language Processing Unit) chips — purpose-built for transformer inference rather than general AI training. The result: dramatically faster inference than Nvidia GPU-based competitors, often 500+ tokens per second for Llama 70B (vs ~50-100 tokens/sec on typical Nvidia setups).

Why speed matters:

Voice agents — instant response feels human; slow response feels broken
Real-time UI — autocomplete, suggestions, live translations
Agent workflows — many model calls per task; latency compounds
Interactive demos — better demo = better adoption

Hosted models:

Llama (4 / 5 family)
Mistral (Large, others)
Mixtral (older Mistral family)
Whisper (speech-to-text — extremely fast)
Gemma (Google open weights)
Plus a curated set — not as broad a catalog as Together / Fireworks

Note: Groq the company (LPU chips) is completely different from “Grok” (Elon Musk’s xAI chatbot). The naming similarity is unfortunate. See Grok entry.

What you’d use it for

Voice agent backends — phone bots, real-time voice products
Real-time UI features — fast autocomplete, suggestions
Agent workflows that make many sequential model calls
Speed-critical demos and prototypes
Whisper transcription at scale — Groq’s Whisper is very fast
Anywhere “fast enough to feel instant” matters

When NOT to use Groq:

For best-quality frontier models (Groq doesn’t host Claude / GPT / Gemini)
For broadest open-weight catalog (Together / Fireworks have more)
For dedicated tenant / regulated workloads (Groq is API; not as enterprise-mature as Bedrock / Azure)

How to use from Australia

Sign up at groq.com. Free credits on sign-up.
Get API key

Use OpenAI-compatible endpoint:

from openai import OpenAI
client = OpenAI(
    api_key="...",
    base_url="https://api.groq.com/openai/v1"
)
response = client.chat.completions.create(
    model="llama-4-70b-versatile",
    messages=[{"role": "user", "content": "Hello"}]
)

AUS card accepted for paid tier

What it costs

Free tier

Generous rate-limited quota for development
No card required initially

Pay-per-token

Llama 4 70B: ~US $0.59/$ 0.79 per million tokens
Llama 4 8B: ~US $0.05/$ 0.10 per million
Mistral Large: comparable to Together
Whisper: ~US$0.10 per hour of audio (very cheap)

Generally priced slightly premium vs Together / Fireworks for the same models — but you pay for speed.

Enterprise

Dedicated capacity
Custom contracts

How it compares to alternatives

Aspect	Groq	Together AI	Fireworks AI	Cerebras
Speed (tokens/sec for Llama 70B)	500+ tps (best)	~50-150 tps	~50-150 tps	Comparable to Groq
Model catalog	Curated (~20+)	Broad	Broad	Limited
Price	Premium for speed	Cheap	Cheap	Premium
Speech transcription	Excellent (Whisper fast)	Yes	Yes	Limited
OpenAI API compatibility	Yes	Yes	Yes	Yes
AUS data residency	Limited	Limited	Limited	Limited
Best for	Speed-critical apps	Broad open-weight production	Broad open-weight production	Speed alternative to Groq

For voice / real-time / latency-critical, Groq wins decisively. For broadest catalog and lowest price, Together / Fireworks.

Privacy / data handling

No training on customer data — committed
API logs retained briefly for abuse-monitoring
US data centres
For Australian sensitive workloads, consider Western cloud alternatives with AUS data residency (AWS Bedrock Sydney) — Groq doesn’t yet offer AUS-region

Recent changes

2026: More model support; Cerebras competition driving improvements
2025: Whisper speed leadership solidified
2024: Groq API generally available

Gotchas

“Groq” vs “Grok” naming confusion — Groq (the chip company) is completely separate from Grok (xAI’s chatbot). Don’t confuse.
Model catalog is curated, not vast — for niche open-weight models, Together / Fireworks broader
For frontier-closed models (Claude / GPT / Gemini), Groq doesn’t host — use those vendors directly
Speed comes with cost — Groq tokens are slightly more expensive per-million than Together
For non-speed-critical use cases, Together / Fireworks usually better cost-performance
Limited multi-region — Groq is US-based; for AUS data residency-critical workloads, AWS Bedrock Sydney is the alternative

Tech & AI, Explained

Explorer

groq