🇺🇸 United States · Deepgram — Real-Time Speech AI

Status: 🟩 COMPLETE 🟦 LIVING Section: 10 — AI and LLMs

VendorDeepgram
Country/origin🇺🇸 United States (San Francisco)
Recommended for AUS?✅ Yes — US-based; SOC 2 Type II; HIPAA capable; strong speed advantage
Privacy summaryAWS hosting; SOC 2 Type II; HIPAA capable; GDPR compliant; audio not used for training; standard enterprise DPA
Free tier✅ $200 USD free credits on signup
Paid tiersPay-per-minute API pricing; volume discounts; enterprise
First released2015 (founded by ex-CERN physicists)
Last reviewedJune 2026
Official sitehttps://deepgram.com

What it is

Deepgram is a speech AI company known particularly for speed — its API is among the fastest commercial speech-to-text services available, making it ideal for real-time applications like voice agents, live captioning, and conversational AI.

Founded by physicists from CERN (the European particle physics laboratory) who applied their data processing expertise to building a speech recognition system from scratch — not just fine-tuning existing models.

Core capabilities:

  • Nova-3: Deepgram’s flagship speech recognition model (fast and accurate)
  • Real-time streaming transcription — sub-300ms latency in many cases
  • Aura-2 (TTS): Deepgram’s text-to-speech for voice agents
  • Voice Agent API: Complete platform for building real-time voice AI agents
  • Diarization, sentiment, intent detection
  • Custom models for domain-specific vocabularies

What you’d use it for (as a developer)

  • Voice AI agents: Customer service bots, phone IVR systems with natural conversation
  • Live captioning for events, video streaming, accessibility
  • Real-time meeting transcription in collaboration apps
  • Voice-driven applications where latency matters
  • Customer service call analytics at scale
  • Conversational AI products competing with Vapi, Bland AI

How to access from Australia

  1. Go to https://deepgram.comSign up free
  2. Sign up with email
  3. Get $200 USD in free credits on signup
  4. API keys in dashboard
  5. Use SDKs (Python, Node.js, Go, .NET, Rust)

Basic Python example:

from deepgram import DeepgramClient, PrerecordedOptions
deepgram = DeepgramClient(api_key="your-key")
options = PrerecordedOptions(model="nova-3", smart_format=True)
response = deepgram.listen.rest.v("1").transcribe_url(
    {"url": "audio-url"}, options
)
print(response.results.channels[0].alternatives[0].transcript)

What it costs

ServicePriceNotes
Nova-3 transcription~$0.0043/minute~$0.26/hour
Real-time streaming~$0.0059/minuteLive audio
Aura-2 TTS~$0.030/1,000 charactersVoice synthesis
Voice AgentCombined pricingFull voice agent

For comparison: a 1-hour podcast transcription on Nova-3 ≈ 0.40 AUD. Among the cheapest cloud STT options.


How it compares to AssemblyAI

The two leading specialised speech AI platforms compete directly:

AspectDeepgramAssemblyAI
Speed✅ FastestFast
Real-time use✅ SpecialisedGood
Voice Agents✅ Native platformVia LeMUR + integration
Audio Intelligence🟡 Has features✅ Stronger Suite
TTS✅ Aura-2Partner integrations
Pricing✅ Often cheaperSlightly higher
Accuracy on conversational audioExcellentExcellent (slight edge)
Free tier$200 credit$50 credit

Decision rule:

  • Real-time voice agents → Deepgram (their specialised platform)
  • Audio analysis, summarisation, content intelligence → AssemblyAI (Audio Intelligence + LeMUR)
  • Cost-sensitive large volume → Deepgram (typically cheaper)
  • Accuracy-critical on tough audio → either; benchmark both

Voice Agent API — the key differentiator

Deepgram’s Voice Agent API (2024) is a complete platform for building voice AI:

  • STT (Nova-3): Hears the user
  • LLM integration: OpenAI, Anthropic, etc. (you choose)
  • TTS (Aura-2): Speaks the response
  • Conversation orchestration: Turn-taking, interruptions, barge-in

This is everything you need to build a voice agent like the ones at Vapi, Bland AI, Retell — but as raw infrastructure for builders.

For comparison, see real-time-voice-ai for the consumer-facing voice products.


Privacy considerations

  • HIPAA capable with BAA
  • Audio not used for training Deepgram’s models
  • Configurable data retention — enterprise customers can have zero retention
  • AWS US hosting primarily; some EU options

For Australian deployments:

  • Standard enterprise DPA addresses APP 8 cross-border disclosure
  • Disclose AI processing in privacy policies
  • Recording consent under Australian state laws
  • For healthcare: HIPAA + Australian Privacy Act sensitive information requirements

Australian considerations

  • Strong Australian accent handling — Nova-3 performs well on Australian English
  • US data hosting — Australian latency for real-time use adds ~150ms vs local, but Deepgram’s processing speed compensates substantially
  • Voice agents for Australian businesses: Deepgram is the infrastructure choice for many Australian voice AI deployments

Gotchas

  • Real-time pricing is per minute of audio. A 24/7 voice agent listening continuously gets expensive — model your costs carefully.
  • Voice Agent platform requires development. It’s not a no-code tool — you write the code that uses Deepgram as infrastructure.
  • Custom models cost more. Domain-specific custom models have higher per-minute pricing.
  • Speed advantages depend on usage pattern. For batch transcription of recordings, Whisper API may be cheaper. Deepgram’s edge is real-time.
  • Free $200 credits expire. Use them within reasonable time after signup.

See also


Sources

  • Deepgram documentation: deepgram.com/docs
  • Nova-3 announcement and benchmarks (2024)
  • Deepgram Voice Agent API launch (2024)
  • Independent benchmarks: ArtificialAnalysis.ai (2024-2026)
  • TechCrunch coverage of Deepgram funding and growth (2022-2024)