๐บ๐ธ USA ยท Groq
Status: ๐ฉ COMPLETE ๐ฆ LIVING Last updated: 2026-06-26 Plain-English tagline: The fastest AI inference on the planet. Custom โLPUโ (Language Processing Unit) chips deliver Llama / Mistral / Whisper responses 5-10ร faster than Nvidia-based competitors. The choice when latency matters more than anything else.
Front-matter facts
| Field | Value |
|---|---|
| Vendor | Groq Inc (Mountain View, USA) โ founded 2016 by Jonathan Ross (ex-Google TPU lead) |
| Country / origin | ๐บ๐ธ USA |
| Recommended for Australian users? | โ Yes โ fully accessible from AUS; US-based infrastructure |
| Privacy summary | API: no training on customer data; standard developer terms |
| Free tier | Yes โ generous free tier (rate-limited but useful for dev) |
| Paid tiers | Pay-per-token; pre-paid credits; Enterprise quoted |
| First released | API generally available 2024 (Groq Inc founded 2016) |
| Last reviewed | 2026-06-26 |
| Official site | https://groq.com |
What it is
Groq runs AI inference on custom-designed LPU (Language Processing Unit) chips โ purpose-built for transformer inference rather than general AI training. The result: dramatically faster inference than Nvidia GPU-based competitors, often 500+ tokens per second for Llama 70B (vs ~50-100 tokens/sec on typical Nvidia setups).
Why speed matters:
- Voice agents โ instant response feels human; slow response feels broken
- Real-time UI โ autocomplete, suggestions, live translations
- Agent workflows โ many model calls per task; latency compounds
- Interactive demos โ better demo = better adoption
Hosted models:
- Llama (4 / 5 family)
- Mistral (Large, others)
- Mixtral (older Mistral family)
- Whisper (speech-to-text โ extremely fast)
- Gemma (Google open weights)
- Plus a curated set โ not as broad a catalog as Together / Fireworks
Note: Groq the company (LPU chips) is completely different from โGrokโ (Elon Muskโs xAI chatbot). The naming similarity is unfortunate. See Grok entry.
What youโd use it for
- Voice agent backends โ phone bots, real-time voice products
- Real-time UI features โ fast autocomplete, suggestions
- Agent workflows that make many sequential model calls
- Speed-critical demos and prototypes
- Whisper transcription at scale โ Groqโs Whisper is very fast
- Anywhere โfast enough to feel instantโ matters
When NOT to use Groq:
- For best-quality frontier models (Groq doesnโt host Claude / GPT / Gemini)
- For broadest open-weight catalog (Together / Fireworks have more)
- For dedicated tenant / regulated workloads (Groq is API; not as enterprise-mature as Bedrock / Azure)
How to use from Australia
- Sign up at
groq.com. Free credits on sign-up. - Get API key
- Use OpenAI-compatible endpoint:
from openai import OpenAI client = OpenAI( api_key="...", base_url="https://api.groq.com/openai/v1" ) response = client.chat.completions.create( model="llama-4-70b-versatile", messages=[{"role": "user", "content": "Hello"}] ) - AUS card accepted for paid tier
What it costs
Free tier
- Generous rate-limited quota for development
- No card required initially
Pay-per-token
- Llama 4 70B: ~US0.79 per million tokens
- Llama 4 8B: ~US0.10 per million
- Mistral Large: comparable to Together
- Whisper: ~US$0.10 per hour of audio (very cheap)
Generally priced slightly premium vs Together / Fireworks for the same models โ but you pay for speed.
Enterprise
- Dedicated capacity
- Custom contracts
How it compares to alternatives
| Aspect | Groq | Together AI | Fireworks AI | Cerebras |
|---|---|---|---|---|
| Speed (tokens/sec for Llama 70B) | 500+ tps (best) | ~50-150 tps | ~50-150 tps | Comparable to Groq |
| Model catalog | Curated (~20+) | Broad | Broad | Limited |
| Price | Premium for speed | Cheap | Cheap | Premium |
| Speech transcription | Excellent (Whisper fast) | Yes | Yes | Limited |
| OpenAI API compatibility | Yes | Yes | Yes | Yes |
| AUS data residency | Limited | Limited | Limited | Limited |
| Best for | Speed-critical apps | Broad open-weight production | Broad open-weight production | Speed alternative to Groq |
For voice / real-time / latency-critical, Groq wins decisively. For broadest catalog and lowest price, Together / Fireworks.
Privacy / data handling
- No training on customer data โ committed
- API logs retained briefly for abuse-monitoring
- US data centres
- For Australian sensitive workloads, consider Western cloud alternatives with AUS data residency (AWS Bedrock Sydney) โ Groq doesnโt yet offer AUS-region
Recent changes
- 2026: More model support; Cerebras competition driving improvements
- 2025: Whisper speed leadership solidified
- 2024: Groq API generally available
Gotchas
- โGroqโ vs โGrokโ naming confusion โ Groq (the chip company) is completely separate from Grok (xAIโs chatbot). Donโt confuse.
- Model catalog is curated, not vast โ for niche open-weight models, Together / Fireworks broader
- For frontier-closed models (Claude / GPT / Gemini), Groq doesnโt host โ use those vendors directly
- Speed comes with cost โ Groq tokens are slightly more expensive per-million than Together
- For non-speed-critical use cases, Together / Fireworks usually better cost-performance
- Limited multi-region โ Groq is US-based; for AUS data residency-critical workloads, AWS Bedrock Sydney is the alternative
See also
- Llama ๐ฉ ๐ฆ
- Whisper ๐ฉ ๐ฆ
- Mistral ๐ฅ
- Cerebras ๐ฅ โ comparable speed-focused chip
- Together AI ๐ฉ ๐ฆ
- Fireworks AI ๐ฅ
- Hugging Face ๐ฉ ๐ฆ
- Nvidia AI ๐ฉ ๐ฆ โ Groq competes against Nvidia chips
- AWS Bedrock ๐ฉ ๐ฆ โ AUS-residency alternative
- Grok (xAI chatbot โ DIFFERENT) ๐ฉ ๐ฆ
- ai-hardware-overview.md ๐ฅ