🇺🇸 USA · OpenAI Whisper

Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: OpenAI’s speech-to-text (transcription) model — open-weight, runs locally or via API, multilingual, excellent quality. The reason “AI transcription” became cheap and reliable.

Front-matter facts

Field	Value
Vendor	OpenAI (San Francisco, USA) — model is open-source under MIT license; commercial API service is OpenAI’s
Country / origin	🇺🇸 USA
Recommended for Australian users?	✅ Yes — open-source weights available globally; OpenAI API accessible from AUS
Privacy summary	Open-source weights run locally = no data leaves your machine. OpenAI API: no training on inputs by default; ChatGPT consumer surfaces inherit ChatGPT privacy posture
Free tier	Whisper open-source weights are completely free to download and run; OpenAI API has free trial credit
Paid tiers	OpenAI API: ~US$0.006/minute audio (extremely cheap); self-hosted = your hardware costs only
First released	September 2022 (open-source release)
Last reviewed	2026-06-26
Official site	https://openai.com/index/whisper/ + https://github.com/openai/whisper

What it is

Whisper is OpenAI’s speech-to-text (transcription) model. Released open-source in September 2022 under the MIT license — making it one of the most-significant open releases in AI history. Whisper:

Transcribes 99+ languages (most-accurate on English; strong on European / major Asian languages)
Translates audio to English in addition to native-language transcription
Multiple size variants — Whisper Tiny / Base / Small / Medium / Large / Turbo / Large-v3
Runs locally on CPUs and especially well on GPUs / Apple Silicon
Runs via API (OpenAI’s hosted version) at very low cost
Used inside many other products — Otter, Granola, Descript, countless others use Whisper under the hood

Whisper is the de facto standard for AI transcription. Most “AI transcription tool” startups use Whisper somewhere in their stack.

OpenAI has continued improving Whisper:

Whisper Large-v3 — current frontier model
Whisper Turbo — fast variant, near-Large quality at much lower latency
gpt-4o-transcribe — newer OpenAI transcription using GPT-4o-family understanding alongside Whisper-style transcription

What you’d use it for

Personal

Transcribe interviews, podcasts, meetings for personal records
Caption your videos for accessibility / SEO
Voice notes that turn into text automatically
Language learning — transcribe foreign-language audio to study

Business / Developer

Meeting transcription built into your app
Customer-call transcripts for support analysis
Accessibility features — live captions
Voice-input in custom apps
Voice agent pipelines — Whisper for STT, then LLM, then TTS

As a foundation

Most AI meeting tools (Otter, Granola, Fireflies, tldv) use Whisper
Most AI dictation tools use Whisper or Whisper-derived models
Build your own with whisper.cpp (open-source C++ port) for embedded use

How to use from Australia

Open-source (local)

Whisper.cpp — github.com/ggerganov/whisper.cpp — fast C++ port, runs on CPU or GPU; works on Mac / Windows / Linux
Faster-whisper (Python) — github.com/SYSTRAN/faster-whisper — Python port, 4× faster than original
whisperX — extends Whisper with speaker labels + word-level timing
MacWhisper — macOS app wrapping Whisper (paid app, very polished)
WhisperKit (Apple) — Apple-optimised Whisper for iOS / macOS

Via OpenAI API

Sign up at platform.openai.com
Use /v1/audio/transcriptions endpoint
POST an audio file (mp3, m4a, wav, etc.)
Get back text transcription
Optionally request translation, timestamps, etc.

Via products that use Whisper

Otter.ai, Granola, Fireflies, tldv, Notta — meeting transcription
Descript, CapCut, Premiere — video editing with transcription
MacOS dictation, iOS Voice Memos — Apple-built but related-tech
WhisperHook, MacWhisper apps — desktop transcription tools

What it costs

Open-source (self-hosted)

Free — MIT license
You pay your hardware electricity + setup
Modern Mac (M1+) can run Large-v3 in real-time
RTX-equipped PC can run Large-v3 faster than real-time

OpenAI API

~US$0.006 per minute of audio (extremely cheap)
1 hour of audio = ~US$0.36
Pay-per-use; no subscription

Wrappers / products

MacWhisper: ~AUD $25 one-time
Whisper Memos / similar apps: free or small one-time fee
Meeting tools using Whisper: their own subscription pricing

How it compares to alternatives

Capability	Whisper	AssemblyAI	Deepgram	Apple Live Captions	Google Cloud Speech-to-Text
Open-source	Yes (MIT)	No	No	iOS/Mac-only	No
Self-hostable	Yes	No	No	No	No
Multilingual	99+ languages	50+	30+	Major languages	100+
Quality (English)	Excellent	Excellent	Excellent	Good	Excellent
Speaker labels (diarization)	Limited (WhisperX adds)	Yes (built-in)	Yes (built-in)	Limited	Yes
Real-time / streaming	Limited	Yes	Yes (best for low-latency)	Yes (on-device)	Yes
Pricing (API)	Cheapest	Higher	Competitive	Free with hardware	Mid
Best for	Most use cases (cheapest + good)	Production with diarization	Real-time / streaming	iPhone/Mac native	Google Cloud users

For most everyday transcription, Whisper (via OpenAI API or self-hosted) is the best choice — cheapest, highest quality, broadest language support.

Privacy / data handling

Self-hosted Whisper = data NEVER leaves your machine — the strongest privacy posture available
OpenAI API = no training on inputs; data retained briefly for abuse-monitoring
Whisper-using products = inherit their own privacy posture (Otter, Granola, etc. — verify per-product)
For sensitive audio (legal interviews, medical consultations, confidential meetings) — strongly prefer self-hosted Whisper

Recent changes

2026: Whisper Turbo widely used; gpt-4o-transcribe variant for higher-end use
2024: Whisper Large-v3 released
2023: Faster-whisper and whisperX ecosystem matured
September 2022: Original Whisper open-source release

Gotchas

Speaker labels (diarization) not built into base Whisper — use whisperX or AssemblyAI / Deepgram if needed
Streaming / real-time is supported via Faster-whisper but not as polished as Deepgram for low-latency apps
Hallucinations — Whisper occasionally inserts phrases that weren’t said (especially on silence / low-quality audio); review high-stakes transcripts
Background music can confuse Whisper — clean audio gets best results
Australian accent is handled well by Large-v3 / Turbo; older / smaller variants struggle more
Specialised vocabulary (medical, legal, technical jargon) may need fine-tuning or post-processing
whisper.cpp on consumer Macs (M1/M2/M3) is incredibly capable — try this before paying for cloud transcription
Apple’s Live Captions (iPhone / Mac built-in) uses Apple’s own model, not Whisper, but is also excellent

Tech & AI, Explained

Explorer

whisper