Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model

github.com

267 points by Curiositry 16 hours ago


d4rkp4ttern - 5 hours ago

I use the open source Handy [1] app with Parakeet V3 for STT when talking to coding agents and I’ve yet to see anything that beats this setup in terms of speed/accuracy. I get near instant transcription, and the slight accuracy drop is immaterial when talking to AIs that can “read between the lines”.

I tried incorporating this Voxtral C implementation into Handy but got very slow transcriptions on my M1 Max MacBook 64GB.

[1] https://github.com/cjpais/Handy

I’ll have to try the other implementations mentioned here.

mythz - 8 hours ago

Big fan of Salvatore's voxtral.c and flux2.c projects - hope they continue to get optimized as it'd be great to have lean options without external deps. Unfortunately it's currently too slow for real-world use (AMD 7800X3D/Blas) when adding Voice Input support to llms-py [1].

In the end Omarchy's new support for voxtype.io provided the nicest UX, followed by Whisper.cpp, and despite being slower, OpenAI's Whisper is still a solid local transcription option.

Also very impressed with both the performance and price of Mistral's new Voxtral Transcription API [2] - really fast/instant and really cheap ($0.003/min), IMO best option in CPU/disk-constrained environments.

[1] https://llmspy.org/docs/features/voice-input

[2] https://docs.mistral.ai/models/voxtral-mini-transcribe-26-02

Curiositry - 14 hours ago

This was a breeze to install on Linux. However, I haven't managed to get realtime transcription working yet, ala Whisper.cpp stream or Moonshine.

--from-mic only supports Mac. I'm able to capture audio with ffmpeg, but adapting the ffmpeg example to use mic capture hasn't worked yet:

ffmpeg -f pulse -channels 1 -i 1 -f s16le - 2>/dev/null | ./voxtral -d voxtral-model --stdin

It's possible my system is simply under spec for the default model.

I'd like to be able to use this with the voxtral-q4.gguf quantized model from here: https://huggingface.co/TrevorJS/voxtral-mini-realtime-gguf

written-beyond - 9 hours ago

Funny, this and the Rust runtime implementation are neck and neck on the frontpage right now.

Cool project!

hrpnk - 8 hours ago

There is also a MLX implementation: https://github.com/awni/voxmlx

sgt - 10 hours ago

I'm very interested in speech to text - but like tricky dialects and use of various terminologies but I'm still confused as to where to start in the best possible place, in order to train the models with a huge database of voice samples I own.

Any ideas from the HN crowd currently involved in speech 2 text models?

9999_points - an hour ago

It seems so bizarre that we need a nearly 9gb model to do something you could do over 20 years ago with ~200mb.

- 11 hours ago
[deleted]
alextray812 - 5 hours ago

From a cybersecurity perspective, this project is impressive not just for performance, but for transparency.

sylware - 6 hours ago

Finally a plain and simple C lib to run LLM opened weights?

MORPHOICES - 8 hours ago

[dead]

genie3io - 9 hours ago

[dead]