Mistral releases Voxtral Realtime for speech on local devices

In short: Mistral has released Voxtral Realtime, a speech-to-text model that can run on personal devices like phones and laptops, and it is available as open source.

What happened

Mistral released a new model called Voxtral Realtime in February 2026. It is part of the company’s Voxtral Transcribe2 system. The model is designed to turn spoken audio into text quickly, with a short delay.

Mistral says Voxtral Realtime is small enough to run locally on devices like smartphones and laptops, instead of needing a remote server. The model is built with 4 billion parameters, which you can think of as the model’s “knobs and dials” for learning patterns. Fewer parameters often means it is easier to run on smaller hardware.

In a setup with about a 480 millisecond delay, Mistral reports a word error rate of 1% to 2%. That is close to the accuracy you get from “offline” transcription, where the system can take more time to process audio. Voxtral Realtime supports 13 languages, including Chinese, English, French, and Japanese.

The model is available on Hugging Face under the Apache 2.0 license, which allows broad reuse with few restrictions. Mistral also offers an API (a paid way to send audio to Mistral’s service) priced at $0.006 per minute.

Why it matters

Running speech tools locally can improve privacy because your audio does not need to be uploaded to the cloud, which is like sending it to someone else’s computer. This could be useful for people and businesses that handle sensitive conversations, or who want voice features even when internet access is limited.

Source: TechCrunch AI

In short: Mistral has released Voxtral Realtime, a speech-to-text model that can run on personal devices like phones and laptops, and it is available as open source.

What happened

Why it matters

Source: TechCrunch AI