🇺🇸 USA · OpenAI Whisper
Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: OpenAI’s speech-to-text (transcription) model — open-weight, runs locally or via API, multilingual, excellent quality. The reason “AI transcription” became cheap and reliable.
Front-matter facts
| Field | Value |
|---|---|
| Vendor | OpenAI (San Francisco, USA) — model is open-source under MIT license; commercial API service is OpenAI’s |
| Country / origin | 🇺🇸 USA |
| Recommended for Australian users? | ✅ Yes — open-source weights available globally; OpenAI API accessible from AUS |
| Privacy summary | Open-source weights run locally = no data leaves your machine. OpenAI API: no training on inputs by default; ChatGPT consumer surfaces inherit ChatGPT privacy posture |
| Free tier | Whisper open-source weights are completely free to download and run; OpenAI API has free trial credit |
| Paid tiers | OpenAI API: ~US$0.006/minute audio (extremely cheap); self-hosted = your hardware costs only |
| First released | September 2022 (open-source release) |
| Last reviewed | 2026-06-26 |
| Official site | https://openai.com/index/whisper/ + https://github.com/openai/whisper |
What it is
Whisper is OpenAI’s speech-to-text (transcription) model. Released open-source in September 2022 under the MIT license — making it one of the most-significant open releases in AI history. Whisper:
- Transcribes 99+ languages (most-accurate on English; strong on European / major Asian languages)
- Translates audio to English in addition to native-language transcription
- Multiple size variants — Whisper Tiny / Base / Small / Medium / Large / Turbo / Large-v3
- Runs locally on CPUs and especially well on GPUs / Apple Silicon
- Runs via API (OpenAI’s hosted version) at very low cost
- Used inside many other products — Otter, Granola, Descript, countless others use Whisper under the hood
Whisper is the de facto standard for AI transcription. Most “AI transcription tool” startups use Whisper somewhere in their stack.
OpenAI has continued improving Whisper:
- Whisper Large-v3 — current frontier model
- Whisper Turbo — fast variant, near-Large quality at much lower latency
- gpt-4o-transcribe — newer OpenAI transcription using GPT-4o-family understanding alongside Whisper-style transcription
What you’d use it for
Personal
- Transcribe interviews, podcasts, meetings for personal records
- Caption your videos for accessibility / SEO
- Voice notes that turn into text automatically
- Language learning — transcribe foreign-language audio to study
Business / Developer
- Meeting transcription built into your app
- Customer-call transcripts for support analysis
- Accessibility features — live captions
- Voice-input in custom apps
- Voice agent pipelines — Whisper for STT, then LLM, then TTS
As a foundation
- Most AI meeting tools (Otter, Granola, Fireflies, tldv) use Whisper
- Most AI dictation tools use Whisper or Whisper-derived models
- Build your own with whisper.cpp (open-source C++ port) for embedded use
How to use from Australia
Open-source (local)
- Whisper.cpp — github.com/ggerganov/whisper.cpp — fast C++ port, runs on CPU or GPU; works on Mac / Windows / Linux
- Faster-whisper (Python) — github.com/SYSTRAN/faster-whisper — Python port, 4× faster than original
- whisperX — extends Whisper with speaker labels + word-level timing
- MacWhisper — macOS app wrapping Whisper (paid app, very polished)
- WhisperKit (Apple) — Apple-optimised Whisper for iOS / macOS
Via OpenAI API
- Sign up at platform.openai.com
- Use
/v1/audio/transcriptionsendpoint - POST an audio file (mp3, m4a, wav, etc.)
- Get back text transcription
- Optionally request translation, timestamps, etc.
Via products that use Whisper
- Otter.ai, Granola, Fireflies, tldv, Notta — meeting transcription
- Descript, CapCut, Premiere — video editing with transcription
- MacOS dictation, iOS Voice Memos — Apple-built but related-tech
- WhisperHook, MacWhisper apps — desktop transcription tools
What it costs
Open-source (self-hosted)
- Free — MIT license
- You pay your hardware electricity + setup
- Modern Mac (M1+) can run Large-v3 in real-time
- RTX-equipped PC can run Large-v3 faster than real-time
OpenAI API
- ~US$0.006 per minute of audio (extremely cheap)
- 1 hour of audio = ~US$0.36
- Pay-per-use; no subscription
Wrappers / products
- MacWhisper: ~AUD $25 one-time
- Whisper Memos / similar apps: free or small one-time fee
- Meeting tools using Whisper: their own subscription pricing
How it compares to alternatives
| Capability | Whisper | AssemblyAI | Deepgram | Apple Live Captions | Google Cloud Speech-to-Text |
|---|---|---|---|---|---|
| Open-source | Yes (MIT) | No | No | iOS/Mac-only | No |
| Self-hostable | Yes | No | No | No | No |
| Multilingual | 99+ languages | 50+ | 30+ | Major languages | 100+ |
| Quality (English) | Excellent | Excellent | Excellent | Good | Excellent |
| Speaker labels (diarization) | Limited (WhisperX adds) | Yes (built-in) | Yes (built-in) | Limited | Yes |
| Real-time / streaming | Limited | Yes | Yes (best for low-latency) | Yes (on-device) | Yes |
| Pricing (API) | Cheapest | Higher | Competitive | Free with hardware | Mid |
| Best for | Most use cases (cheapest + good) | Production with diarization | Real-time / streaming | iPhone/Mac native | Google Cloud users |
For most everyday transcription, Whisper (via OpenAI API or self-hosted) is the best choice — cheapest, highest quality, broadest language support.
Privacy / data handling
- Self-hosted Whisper = data NEVER leaves your machine — the strongest privacy posture available
- OpenAI API = no training on inputs; data retained briefly for abuse-monitoring
- Whisper-using products = inherit their own privacy posture (Otter, Granola, etc. — verify per-product)
- For sensitive audio (legal interviews, medical consultations, confidential meetings) — strongly prefer self-hosted Whisper
Recent changes
- 2026: Whisper Turbo widely used; gpt-4o-transcribe variant for higher-end use
- 2024: Whisper Large-v3 released
- 2023: Faster-whisper and whisperX ecosystem matured
- September 2022: Original Whisper open-source release
Gotchas
- Speaker labels (diarization) not built into base Whisper — use whisperX or AssemblyAI / Deepgram if needed
- Streaming / real-time is supported via Faster-whisper but not as polished as Deepgram for low-latency apps
- Hallucinations — Whisper occasionally inserts phrases that weren’t said (especially on silence / low-quality audio); review high-stakes transcripts
- Background music can confuse Whisper — clean audio gets best results
- Australian accent is handled well by Large-v3 / Turbo; older / smaller variants struggle more
- Specialised vocabulary (medical, legal, technical jargon) may need fine-tuning or post-processing
- whisper.cpp on consumer Macs (M1/M2/M3) is incredibly capable — try this before paying for cloud transcription
- Apple’s Live Captions (iPhone / Mac built-in) uses Apple’s own model, not Whisper, but is also excellent
See also
- OpenAI API 🟥
- ChatGPT 🟩 🟦 — Advanced Voice mode uses Whisper-style STT
- AssemblyAI 🟥
- Deepgram 🟥
- Otter.ai 🟥
- Granola 🟥
- Fireflies.ai 🟥
- Descript 🟥
- ElevenLabs (voice gen — sibling) 🟩 🟦
- Multimodal (vision, audio) 🟩 🟦
- Apple Intelligence (Live Captions) 🟩 🟦
- which-ai-for-which-job.md 🟩 🟦
- Glossary — W (Whisper) 🟩