When we launched Tellaflow, we supported one model: OpenAI Whisper. It's excellent multilingual, well-studied, genuinely accurate. But for users whose primary use case is English dictation, it had a tradeoff: the large model was accurate but slower; the small model was fast but missed things.
We spent three weeks evaluating NVIDIA's Parakeet model before shipping it in Tellaflow. Here's what we learned.
What is Parakeet?
Parakeet is a family of automatic speech recognition (ASR) models developed by NVIDIA as part of their NeMo framework. The specific model we ship parakeet-tdt-0.6b-v2 is a 600M parameter model trained exclusively on English.
The "TDT" stands for Token-and-Duration Transducer, an architecture that's particularly efficient for streaming transcription. It was designed to minimize latency words appear as you speak them, not after a processing delay.
How Parakeet compares to Whisper for English
Whisper is multilingual by design. It can handle 99 languages and translate between them. That flexibility comes with a cost: the model architecture carries overhead for multilingual capability even when you're just speaking English.
Parakeet is English-only. That's not a limitation it's a focus. By training only on English and optimizing for English phonetics, the model achieves comparable accuracy to Whisper large at roughly the size of Whisper small. On Apple Silicon M-chips, this means:
- Lower memory footprint (easier on 8 GB RAM Macs)
- Faster inference (words appear sooner after you stop speaking)
- Better accuracy on US English than similarly-sized Whisper variants
The testing methodology
We ran both models against the same test corpus over three weeks: a mix of recorded dictation sessions, technical content (code comments, variable names, developer jargon), conversational English, and accented speech from team members who don't have American accents.
Parakeet's word error rate (WER) on clean US English was consistently lower than Whisper small and competitive with Whisper medium. The latency difference was notable: Parakeet consistently transcribed 20–40% faster wall-clock time than Whisper medium on the same M2 hardware.
Where Whisper wins: anything non-English. Parakeet simply doesn't understand other languages. If you need live translation or dictation in French, Hindi, or Japanese, stick with Whisper.
Which model should you use?
Our recommendation:
- Parakeet if you speak English exclusively and want the lowest latency with the smallest RAM footprint.
- Whisper small or medium for a balance of accuracy and speed across languages.
- Whisper large if maximum accuracy is the priority and you have 16 GB+ RAM.
- Whisper (any size) if you use the live translation feature Parakeet doesn't support translation.
What's next for model support
Adding Parakeet is the beginning of a broader goal: supporting the best available on-device ASR models as they emerge. We're tracking Faster Whisper, Parakeet-CTC, and several community-requested models. If you have a model you'd like to see supported, the issue tracker is open.
The goal is simple: whatever the best model is for your use case, Tellaflow should be able to run it, locally, for free.