WhisperKit: On-device Real-time ASR with Billion-Scale Transformers

Jul 13, 2025

Sprecher:innen

Über

Real-time Automatic Speech Recognition (ASR) is a fundamental building block for many commercial applications of ML, including live captioning, dictation, meeting transcriptions, and medical scribes. Accuracy and latency are the most important factors when companies select a system to deploy. We present WhisperKit, an optimized on-device inference system for real-time ASR that significantly outperforms leading cloud-based systems. We benchmark against server-side systems that deploy a diverse set of models, including a frontier model (OpenAI gpt-4o-transcribe), a proprietary model (Deepgram nova-3), and an open-source model (Fireworks large-v3-turbo).Our results show that WhisperKit matches the lowest latency at 0.46s while achieving the highest accuracy 2.2\% WER. The optimizations behind the WhisperKit system are described in detail in this paper.

Organisator

Kategorien

Gefällt euch das Format? Vertraut auf SlidesLive, um euer nächstes Event festzuhalten!

Professionelle Aufzeichnung und Livestreaming – weltweit.

Freigeben

Empfohlene Videos

Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind

Interessiert an Vorträgen wie diesem? ICML 2025 folgen