Mistral launched Voxtral Realtime, an open-source speech model that can run on phones and other devices, with low delay and strong accuracy.
In short: Mistral has released Voxtral Realtime, a speech-to-text model that can run on personal devices like phones and laptops, and it is available as open source.
Mistral released a new model called Voxtral Realtime in February 2026. It is part of the company’s Voxtral Transcribe2 system. The model is designed to turn spoken audio into text quickly, with a short delay.
Mistral says Voxtral Realtime is small enough to run locally on devices like smartphones and laptops, instead of needing a remote server. The model is built with 4 billion parameters, which you can think of as the model’s “knobs and dials” for learning patterns. Fewer parameters often means it is easier to run on smaller hardware.
In a setup with about a 480 millisecond delay, Mistral reports a word error rate of 1% to 2%. That is close to the accuracy you get from “offline” transcription, where the system can take more time to process audio. Voxtral Realtime supports 13 languages, including Chinese, English, French, and Japanese.
The model is available on Hugging Face under the Apache 2.0 license, which allows broad reuse with few restrictions. Mistral also offers an API (a paid way to send audio to Mistral’s service) priced at $0.006 per minute.
Running speech tools locally can improve privacy because your audio does not need to be uploaded to the cloud, which is like sending it to someone else’s computer. This could be useful for people and businesses that handle sensitive conversations, or who want voice features even when internet access is limited.
Source: TechCrunch AI
82
Productivity & Workflow75
Software Development66
Automation & Workflow57
AI Infrastructure & MLOps46
Data & Analytics37
Marketing & Growth41
Voice & Speech42
Customer Support31
Writing & Content Creation42
Sales & Outreach29
Design & Creative32
Operations & Admin26
Photography & Imaging36
Research & Analysis32