🇺🇸 USA · OpenAI Whisper

Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: OpenAI’s speech-to-text (transcription) model — open-weight, runs locally or via API, multilingual, excellent quality. The reason “AI transcription” became cheap and reliable.


Front-matter facts

FieldValue
VendorOpenAI (San Francisco, USA) — model is open-source under MIT license; commercial API service is OpenAI’s
Country / origin🇺🇸 USA
Recommended for Australian users?✅ Yes — open-source weights available globally; OpenAI API accessible from AUS
Privacy summaryOpen-source weights run locally = no data leaves your machine. OpenAI API: no training on inputs by default; ChatGPT consumer surfaces inherit ChatGPT privacy posture
Free tierWhisper open-source weights are completely free to download and run; OpenAI API has free trial credit
Paid tiersOpenAI API: ~US$0.006/minute audio (extremely cheap); self-hosted = your hardware costs only
First releasedSeptember 2022 (open-source release)
Last reviewed2026-06-26
Official sitehttps://openai.com/index/whisper/ + https://github.com/openai/whisper

What it is

Whisper is OpenAI’s speech-to-text (transcription) model. Released open-source in September 2022 under the MIT license — making it one of the most-significant open releases in AI history. Whisper:

  • Transcribes 99+ languages (most-accurate on English; strong on European / major Asian languages)
  • Translates audio to English in addition to native-language transcription
  • Multiple size variants — Whisper Tiny / Base / Small / Medium / Large / Turbo / Large-v3
  • Runs locally on CPUs and especially well on GPUs / Apple Silicon
  • Runs via API (OpenAI’s hosted version) at very low cost
  • Used inside many other products — Otter, Granola, Descript, countless others use Whisper under the hood

Whisper is the de facto standard for AI transcription. Most “AI transcription tool” startups use Whisper somewhere in their stack.

OpenAI has continued improving Whisper:

  • Whisper Large-v3 — current frontier model
  • Whisper Turbo — fast variant, near-Large quality at much lower latency
  • gpt-4o-transcribe — newer OpenAI transcription using GPT-4o-family understanding alongside Whisper-style transcription

What you’d use it for

Personal

  • Transcribe interviews, podcasts, meetings for personal records
  • Caption your videos for accessibility / SEO
  • Voice notes that turn into text automatically
  • Language learning — transcribe foreign-language audio to study

Business / Developer

  • Meeting transcription built into your app
  • Customer-call transcripts for support analysis
  • Accessibility features — live captions
  • Voice-input in custom apps
  • Voice agent pipelines — Whisper for STT, then LLM, then TTS

As a foundation

  • Most AI meeting tools (Otter, Granola, Fireflies, tldv) use Whisper
  • Most AI dictation tools use Whisper or Whisper-derived models
  • Build your own with whisper.cpp (open-source C++ port) for embedded use

How to use from Australia

Open-source (local)

  1. Whisper.cpp — github.com/ggerganov/whisper.cpp — fast C++ port, runs on CPU or GPU; works on Mac / Windows / Linux
  2. Faster-whisper (Python) — github.com/SYSTRAN/faster-whisper — Python port, 4× faster than original
  3. whisperX — extends Whisper with speaker labels + word-level timing
  4. MacWhisper — macOS app wrapping Whisper (paid app, very polished)
  5. WhisperKit (Apple) — Apple-optimised Whisper for iOS / macOS

Via OpenAI API

  1. Sign up at platform.openai.com
  2. Use /v1/audio/transcriptions endpoint
  3. POST an audio file (mp3, m4a, wav, etc.)
  4. Get back text transcription
  5. Optionally request translation, timestamps, etc.

Via products that use Whisper

  • Otter.ai, Granola, Fireflies, tldv, Notta — meeting transcription
  • Descript, CapCut, Premiere — video editing with transcription
  • MacOS dictation, iOS Voice Memos — Apple-built but related-tech
  • WhisperHook, MacWhisper apps — desktop transcription tools

What it costs

Open-source (self-hosted)

  • Free — MIT license
  • You pay your hardware electricity + setup
  • Modern Mac (M1+) can run Large-v3 in real-time
  • RTX-equipped PC can run Large-v3 faster than real-time

OpenAI API

  • ~US$0.006 per minute of audio (extremely cheap)
  • 1 hour of audio = ~US$0.36
  • Pay-per-use; no subscription

Wrappers / products

  • MacWhisper: ~AUD $25 one-time
  • Whisper Memos / similar apps: free or small one-time fee
  • Meeting tools using Whisper: their own subscription pricing

How it compares to alternatives

CapabilityWhisperAssemblyAIDeepgramApple Live CaptionsGoogle Cloud Speech-to-Text
Open-sourceYes (MIT)NoNoiOS/Mac-onlyNo
Self-hostableYesNoNoNoNo
Multilingual99+ languages50+30+Major languages100+
Quality (English)ExcellentExcellentExcellentGoodExcellent
Speaker labels (diarization)Limited (WhisperX adds)Yes (built-in)Yes (built-in)LimitedYes
Real-time / streamingLimitedYesYes (best for low-latency)Yes (on-device)Yes
Pricing (API)CheapestHigherCompetitiveFree with hardwareMid
Best forMost use cases (cheapest + good)Production with diarizationReal-time / streamingiPhone/Mac nativeGoogle Cloud users

For most everyday transcription, Whisper (via OpenAI API or self-hosted) is the best choice — cheapest, highest quality, broadest language support.


Privacy / data handling

  • Self-hosted Whisper = data NEVER leaves your machine — the strongest privacy posture available
  • OpenAI API = no training on inputs; data retained briefly for abuse-monitoring
  • Whisper-using products = inherit their own privacy posture (Otter, Granola, etc. — verify per-product)
  • For sensitive audio (legal interviews, medical consultations, confidential meetings) — strongly prefer self-hosted Whisper

Recent changes

  • 2026: Whisper Turbo widely used; gpt-4o-transcribe variant for higher-end use
  • 2024: Whisper Large-v3 released
  • 2023: Faster-whisper and whisperX ecosystem matured
  • September 2022: Original Whisper open-source release

Gotchas

  • Speaker labels (diarization) not built into base Whisper — use whisperX or AssemblyAI / Deepgram if needed
  • Streaming / real-time is supported via Faster-whisper but not as polished as Deepgram for low-latency apps
  • Hallucinations — Whisper occasionally inserts phrases that weren’t said (especially on silence / low-quality audio); review high-stakes transcripts
  • Background music can confuse Whisper — clean audio gets best results
  • Australian accent is handled well by Large-v3 / Turbo; older / smaller variants struggle more
  • Specialised vocabulary (medical, legal, technical jargon) may need fine-tuning or post-processing
  • whisper.cpp on consumer Macs (M1/M2/M3) is incredibly capable — try this before paying for cloud transcription
  • Apple’s Live Captions (iPhone / Mac built-in) uses Apple’s own model, not Whisper, but is also excellent

See also


Sources