🇺🇸 United States · Deepgram — Real-Time Speech AI

Status: 🟩 COMPLETE 🟦 LIVING Section: 10 — AI and LLMs


Vendor	Deepgram
Country/origin	🇺🇸 United States (San Francisco)
Recommended for AUS?	✅ Yes — US-based; SOC 2 Type II; HIPAA capable; strong speed advantage
Privacy summary	AWS hosting; SOC 2 Type II; HIPAA capable; GDPR compliant; audio not used for training; standard enterprise DPA
Free tier	✅ $200 USD free credits on signup
Paid tiers	Pay-per-minute API pricing; volume discounts; enterprise
First released	2015 (founded by ex-CERN physicists)
Last reviewed	June 2026
Official site	https://deepgram.com

What it is

Deepgram is a speech AI company known particularly for speed — its API is among the fastest commercial speech-to-text services available, making it ideal for real-time applications like voice agents, live captioning, and conversational AI.

Founded by physicists from CERN (the European particle physics laboratory) who applied their data processing expertise to building a speech recognition system from scratch — not just fine-tuning existing models.

Core capabilities:

Nova-3: Deepgram’s flagship speech recognition model (fast and accurate)
Real-time streaming transcription — sub-300ms latency in many cases
Aura-2 (TTS): Deepgram’s text-to-speech for voice agents
Voice Agent API: Complete platform for building real-time voice AI agents
Diarization, sentiment, intent detection
Custom models for domain-specific vocabularies

What you’d use it for (as a developer)

Voice AI agents: Customer service bots, phone IVR systems with natural conversation
Live captioning for events, video streaming, accessibility
Real-time meeting transcription in collaboration apps
Voice-driven applications where latency matters
Customer service call analytics at scale
Conversational AI products competing with Vapi, Bland AI

How to access from Australia

Go to https://deepgram.com → Sign up free
Sign up with email
Get $200 USD in free credits on signup
API keys in dashboard
Use SDKs (Python, Node.js, Go, .NET, Rust)

Basic Python example:

from deepgram import DeepgramClient, PrerecordedOptions
deepgram = DeepgramClient(api_key="your-key")
options = PrerecordedOptions(model="nova-3", smart_format=True)
response = deepgram.listen.rest.v("1").transcribe_url(
    {"url": "audio-url"}, options
)
print(response.results.channels[0].alternatives[0].transcript)

What it costs

Service	Price	Notes
Nova-3 transcription	~$0.0043/minute	~$0.26/hour
Real-time streaming	~$0.0059/minute	Live audio
Aura-2 TTS	~$0.030/1,000 characters	Voice synthesis
Voice Agent	Combined pricing	Full voice agent

For comparison: a 1-hour podcast transcription on Nova-3 ≈ $0.26 U S D =$ 0.40 AUD. Among the cheapest cloud STT options.

How it compares to AssemblyAI

The two leading specialised speech AI platforms compete directly:

Aspect	Deepgram	AssemblyAI
Speed	✅ Fastest	Fast
Real-time use	✅ Specialised	Good
Voice Agents	✅ Native platform	Via LeMUR + integration
Audio Intelligence	🟡 Has features	✅ Stronger Suite
TTS	✅ Aura-2	Partner integrations
Pricing	✅ Often cheaper	Slightly higher
Accuracy on conversational audio	Excellent	Excellent (slight edge)
Free tier	$200 credit	$50 credit

Decision rule:

Real-time voice agents → Deepgram (their specialised platform)
Audio analysis, summarisation, content intelligence → AssemblyAI (Audio Intelligence + LeMUR)
Cost-sensitive large volume → Deepgram (typically cheaper)
Accuracy-critical on tough audio → either; benchmark both

Voice Agent API — the key differentiator

Deepgram’s Voice Agent API (2024) is a complete platform for building voice AI:

STT (Nova-3): Hears the user
LLM integration: OpenAI, Anthropic, etc. (you choose)
TTS (Aura-2): Speaks the response
Conversation orchestration: Turn-taking, interruptions, barge-in

This is everything you need to build a voice agent like the ones at Vapi, Bland AI, Retell — but as raw infrastructure for builders.

For comparison, see real-time-voice-ai for the consumer-facing voice products.

Privacy considerations

HIPAA capable with BAA
Audio not used for training Deepgram’s models
Configurable data retention — enterprise customers can have zero retention
AWS US hosting primarily; some EU options

For Australian deployments:

Standard enterprise DPA addresses APP 8 cross-border disclosure
Disclose AI processing in privacy policies
Recording consent under Australian state laws
For healthcare: HIPAA + Australian Privacy Act sensitive information requirements

Australian considerations

Strong Australian accent handling — Nova-3 performs well on Australian English
US data hosting — Australian latency for real-time use adds ~150ms vs local, but Deepgram’s processing speed compensates substantially
Voice agents for Australian businesses: Deepgram is the infrastructure choice for many Australian voice AI deployments

Gotchas

Real-time pricing is per minute of audio. A 24/7 voice agent listening continuously gets expensive — model your costs carefully.
Voice Agent platform requires development. It’s not a no-code tool — you write the code that uses Deepgram as infrastructure.
Custom models cost more. Domain-specific custom models have higher per-minute pricing.
Speed advantages depend on usage pattern. For batch transcription of recordings, Whisper API may be cheaper. Deepgram’s edge is real-time.
Free $200 credits expire. Use them within reasonable time after signup.

Sources

Deepgram documentation: deepgram.com/docs
Nova-3 announcement and benchmarks (2024)
Deepgram Voice Agent API launch (2024)
Independent benchmarks: ArtificialAnalysis.ai (2024-2026)
TechCrunch coverage of Deepgram funding and growth (2022-2024)

Tech & AI, Explained

Explorer

deepgram