Show HN: Whispering – Open-source, local-first dictation you can trust

github.com

573 points by braden-w 3 days ago


Hey HN! Braden here, creator of Whispering, an open-source speech-to-text app.

I really like dictation. For years, I relied on transcription tools that were almost good, but they were all closed-source. Even a lot of them that claimed to be “local” or “on-device” were still black boxes that left me wondering where my audio really went.

So I built Whispering. It’s open-source, local-first, and most importantly, transparent with your data. Your data is stored locally on your device, and your audio goes directly from your machine to a local provider (Whisper C++, Speaches, etc.) or your chosen cloud provider (Groq, OpenAI, ElevenLabs, etc.). For me, the features were good enough that I left my paid tools behind (I used Superwhisper and Wispr Flow before).

Productivity apps should be open-source and transparent with your data, but they also need to match the UX of paid, closed-software alternatives. I hope Whispering is near that point. I use it for several hours a day, from coding to thinking out loud while carrying pizza boxes back from the office.

Here’s an overview: https://www.youtube.com/watch?v=1jYgBMrfVZs, and here’s how I personally am using it with Claude Code these days: https://www.youtube.com/watch?v=tpix588SeiQ.

There are plenty of transcription apps out there, but I hope Whispering adds some extra competition from the OSS ecosystem (one of my other OSS favorites is Handy https://github.com/cjpais/Handy). Whispering has a few tricks up its sleeve, like a voice-activated mode for hands-free operation (no button holding), and customizable AI transformations with any prompt/model.

Whispering used to be in my personal GH repo, but I recently moved it as part of a larger project called Epicenter (https://github.com/epicenter-so/epicenter), which I should explain a bit...

I’m basically obsessed with local-first open-source software. I think there should be an open-source, local-first version of every app, and I would like them all to work together. The idea of Epicenter is to store your data in a folder of plaintext and SQLite, and build a suite of interoperable, local-first tools on top of this shared memory. Everything is totally transparent, so you can trust it.

Whispering is the first app in this effort. It’s not there yet regarding memory, but it’s getting there. I’ll probably write more about the bigger picture soon, but mainly I just want to make software and let it speak for itself (no pun intended in this case!), so this is my Show HN for now.

I just finished college and was about to move back with my parents and work on this instead of getting a job…and then I somehow got into YC. So my current plan is to cover my living expenses and use the YC funding to support maintainers, our dependencies, and people working on their own open-source local-first projects. More on that soon.

Would love your feedback, ideas, and roasts. If you would like to support the project, star it on GitHub here (https://github.com/epicenter-so/epicenter) and join the Discord here (https://go.epicenter.so/discord). Everything’s MIT licensed, so fork it, break it, ship your own version, copy whatever you want!

pstroqaty - 2 days ago

If anyone's interested in a janky-but-works-great dictation setup on Linux, here's mine:

On key press, start recording microphone to /tmp/dictate.mp3:

  # Save up to 10 mins. Minimize buffering. Save pid
  ffmpeg -f pulse -i default -ar 16000 -ac 1 -t 600 -y -c:a libmp3lame -q:a 2 -flush_packets 1 -avioflags direct -loglevel quiet /tmp/dictate.mp3 &
  echo $! > /tmp/dictate.pid
On key release, stop recording, transcribe with whisper.cpp, trim whitespace and print to stdout:

  # Stop recording
  kill $(cat /tmp/dictate.pid)
  # Transcribe
  whisper-cli --language en --model $HOME/.local/share/whisper/ggml-large-v3-turbo-q8_0.bin --no-prints --no-timestamps /tmp/dictate.mp3 | tr -d '\n' | sed 's/^[[:space:]]*//;s/[[:space:]]*$//'
I keep these in a dictate.sh script and bind to press/release on a single key. A programmable keyboard helps here. I use https://git.sr.ht/%7Egeb/dotool to turn the transcription into keystrokes. I've also tried ydotool and wtype, but they seem to swallow keystrokes.

  bindsym XF86Launch5 exec dictate.sh start
  bindsym --release XF86Launch5 exec echo "type $(dictate.sh stop)" | dotoolc
This gives a very functional push-to-talk setup.

I'm very impressed with https://github.com/ggml-org/whisper.cpp. Transcription quality with large-v3-turbo-q8_0 is excellent IMO and a Vulkan build is very fast on my 6600XT. It takes about 1s for an average sentence to appear after I release the hotkey.

I'm keeping an eye on the NVidia models, hopefully they work on ggml soon too. E.g. https://github.com/ggml-org/whisper.cpp/issues/3118.

wkcheng - 3 days ago

Does this support using the Parakeet model locally? I'm a MacWhisper user and I find that Parakeet is way better and faster than Whisper for on-device transcription. I've been using push-to-transcribe with MacWhisper through Parakeet for a while now and it's quite magical.

chrisweekly - 2 days ago

> "I think there should be an open-source, local-first version of every app, and I would like them all to work together. The idea of Epicenter is to store your data in a folder of plaintext and SQLite, and build a suite of interoperable, local-first tools on top of this shared memory. Everything is totally transparent, so you can trust it."

Yes! This. I have almost no experience w/ tts, but if/when I explore the space, I'll start w/ Whispering -- because of Epicenter. Starred the repo, and will give some thought to other apps that might make sense to contribute there. Bravo, thanks for publishing these and sharing, and congrats on getting into YC! :)

braden-w - 3 days ago

For those checking out the repo this morning, I'm in the middle of a release that adds Whisper C++ support!

https://github.com/epicenter-so/epicenter/pull/655

After this pushes, we'll have far more extensive local transcription support. Just fixing a few more small things :)

marcodiego - 2 days ago

> I’m basically obsessed with local-first open-source software.

We all should be.

dumbmrblah - 3 days ago

I’ve been using whispering for about a year now, it has really changed how I interact with the computer. I make sure to buy mice or keyboards that have programmable hotkeys so that I can use the shortcuts for whispering. I can’t go back to regular typing at this point, just feels super inefficient. Thanks again for all your hard work!

Tmpod - 2 days ago

I've been interested in dictation for a while, but I don't want to be sending any audio to a remote API, it all has to be local. Having tried just a couple of models (namely the one used by the FUTO Keyboard), I'm kinda feeling like we're not quite there yet.

My biggest gripe perhaps is not being able to get decent content out of a thought stream; the models can't properly filter out the pauses, "uuuuhmms", and much less so handle on the fly corrections to what I've been saying, like going back and repeating something with a slight variation and whatnot.

This is a challenging problem I'd love to see being tackled well by open models I can run on my computer or phone. Are there new models more capable of this? Is it not just a model thing, and I missing a good app too?

In the meanwhile, I'll keep typing, even though it can be quite a bit less convenient to do; especially true for note taking on the go.

glial - 3 days ago

This is wonderful, thank you for sharing!

Do you have any sense of whether this type of model would work with children's speech? There are plenty of educational applications that would value a privacy-first locally deployed model. But, my understanding is that Whisper performs pretty poorly with younger speakers.

hephaes7us - 2 days ago

Thanks for sharing! Transcription suddenly became useful to me when LLMs started being able to generate somewhat useful code from natural language. (I don't think anybody wants to dictate code.) Now my workflow is similar to yours.

I have mixed feelings about OS-integration. I'm currently working on a project to use a foot-pedal for push-to-transcribe - it speaks USB-HID so it works anywhere without software, and it doesn't clobber my clipboard. That said, an app like yours really opens up some cool possibilities! For example, in a keyboard-emulation strategy like mine, I can't easily adjust the text prompt/hint for the transcription model.

With an application running on the host though, you can inject relevant context/prompts/hints (either for transcription, or during your post-transformations). These might be provided intentionally by the user, or, if they really trust your app, this context could even be scraped from what's currently on-screen (or which files are currently being worked on).

Another thing I've thought about doing is using a separate keybind (or button/pedal) that appends the transcription directly to a running notes file. I often want to make a note to reference later, but which I don't need immediately. It's a little extra friction to have to actually have my notes file open in a window somewhere.

Will keep an eye on epicenter, appreciate the ethos.

divan - 2 days ago

As many other people commented on similar projects, one of the issues of trying to use voice dictation instead of typing is the lack of real-time visual indication. When we write, we immediately see the text, which helps to keep the thought (especially in longer sentences/paragraphs). But with dictation, it either comes with a delay or only when dictation is over, and it doesn't feel as comfortable as writing. Tangentially, many people "think as they write" and dictation doesn't offer that experience.

I wonder if it changes with time for people who use dictation often.

michael-sumner - 2 days ago

How does this compare to VoiceInk which is also open-source and been there much longer and supports all the features that you have? https://github.com/Beingpax/VoiceInk

tummler - 2 days ago

Related, just as a heads up. I've been using this for 100% local offline transcription for a while, works well: https://github.com/pluja/whishper

oulipo - 2 days ago

Really nice!

For OsX there is also the great VoiceInk which is similar and open-source https://github.com/Beingpax/VoiceInk/

mrgaro - 2 days ago

I'd love to find a tool which could recognise a few different speakers so that I could automatically dictate 1:1 sessions. In addition, I definitively would want to feed that to an LLM to cleanup the notes (to remove all "umm" and similar nonsense) and to do context aware spell checking.

The LLM part should be very much doable, but I'm not sure if speaker recognition exists in a sufficiently working state?

PickledJesus - 2 days ago

Great software, I've been using this since the start of this year, I use it every day, initially as a frustration with ChatGPT and Claude not having proper voice support in their desktop versions and then everywhere.

When you are in an environment where you can dictate, it really is a game changer. Not only is dictating much faster than typing, even if you're a fast typist, I find that you don't have the sticking problem of composing a message quite as much. It also makes my typing feel more like natural speech.

I have both the record and cancel actions bound to side buttons on my mouse, and paste to a third, the auto-paste feature is frustrating in my opinion.

I do miss having a taskbar icon to see if I'm recording or not. Sometimes I accidentally leave it running and sometimes the audio cues break until I restart it.

Transformations are great, despite an extreme amount of prompt engineering, I can't seem to stop the transformation model occasionally responding to my message rather than just transforming it though..

mrbig0 - 2 days ago

Among all the offline transcription apps I've tried, my favorite remains https://whispernotes.app. High accuracy, one-time purchase, and genuinely offline. I love its clean UI.

Honestly, I'm getting tired of subscription-based apps. If it's truly offline, shouldn't it support a one-time purchase model? The whole point of local-first is that you're not dependent on ongoing cloud services, so why structure pricing like you are?

That said, will definitely give Whispering a try - always happy to see more open source alternatives in this space, especially with the local whisper.cpp integration that just landed.

solarkraft - 3 days ago

Cool! I just started becoming interested in local transcription myself.

If you add Deepgram listen API compatibility, you can do live transcription via either Deepgram (duh) or OWhisper: https://news.ycombinator.com/item?id=44901853

(I haven’t gotten the Deepgram JS SDK working with it yet, currently awaiting a response by the maintainers)

Aachen - 2 days ago

Wait, I'm confused. The text here says all data remains on device and emphasises how much you can trust that, that you're obsessed with local-first software, etc. Clicking on the demo video, step one is... configuring access tokens for external services? Are the services shown at 0:21 (Groq, OpenAI, Antrophic, Google, ElevenLabs) doing the actual transcription, listening to everything I say, and is only the resulting text that they give us subject to "it all stays on your device"? Because that's not at all what I expected after reading this description

0xbadcafebee - 2 days ago

Not a fan of high resource use or reliance on proprietary vendors/services. DeepSpeech/Vosk were pre-AI and still worked well on local devices, but they were a huge pain to set up and use. Anyone have better versions of those? Looks like one successor was Coqui STT, which then evolved into Coqui TTS which seems still maintained. Kaldi seems older but also still maintained.

edit: nvm, this overview explains the different options: https://www.gladia.io/blog/best-open-source-speech-to-text-m... and https://www.gladia.io/blog/thinking-of-using-open-source-whi...

g48ywsJk6w48 - 2 days ago

Thank you for sharing such a great product. Last week after getting fed up with a lot of slow commercial products and wrote my own similar app that works locally in the loop and can record everything I say at the push of a button, transcribe it and put this into the app itself. And for me it was really important to create a second mode so I could speak everything I want in my mother tongue and that gets translated into English automatically. Of course, it all works with formatting, with the placement of commas, quote, etc. It is hard to believe that this hasn't been done in a native dictation app on macOS yet.

Johnny_Bonk - 3 days ago

Great work! I've been using Willow Voice but I think I will migrate to this (much cheaper) but they do have a great UI or UX just by hitting a key to start recording and the context goes into whatever text input you want. I haven't installed whispering yet but will do so. P.S

jnmandal - 2 days ago

Looks like a really cool project. Do you have any opinions on which transcription models are the best, from a quality perspective? I have heard a lot of mixed opinions on this. Curious what you've found in your development process?

ayushrodrigues - 2 days ago

I've been interested in a tool like this for a while. I currently have tried whisprflow and aqua voice but wanted to use my API key and store more context locally. How does all the data get stored and how can I access it?

mrs6969 - 3 days ago

am I not getting it correctly; it says local is possible but can't find any information about how to run it without any api key?

I get the whispers models, and do what? how to run in a device without internet, no documentation about it...

- 2 days ago
[deleted]
jryio - 2 days ago

Does this functionality exist on iOS ? I'm looking for an iOS app that wraps Parakeet or whisper in a custom iOS keyboard.

That way I can switch to the dictation keyboard, press dictate, and have the transcription inserted in any application (first or third party).

MacWhisper is fantastic for macOS system dictation but the same abilities don't exist on iOS yet. The native iOS dictation is quite good but not as accurate with bespoke technical words / acronyms as Whisper cpp.

shinycode - a day ago

It already exists with great execution :

https://github.com/kitlangton/Hex

It translates to proper language also

jagermo - 2 days ago

This earned an upvote for the fantastic readme / installation guide alone. Very well done.

Brajeshwar - 2 days ago

I’m beginning to like the idea in this space — local first with a backup with your own tool. Recently, https://hyprnote.com was popular here on Hacker News and it is pretty good. They also do the same, works local-first but you can use your preferred tool too.

progx - 2 days ago

Does additional scripts/ other tools exists that can do the following:

Record permanent the voice (without shortkey) e.g. "run" compile and run a script, "code" switch back to code editor.

Under windows i use AutoHotKey2, but i would replace it with simple voice commands.

alnxdrawr - a day ago

I would be very interested in a version of this that allow recording from both microphone and audio at the same time. Then it could get plugged into WhisperX for diarization..

But even just having anything that's being said recorded would be outstanding

pabs3 - 2 days ago

Are there any speech-to-text models that are fully OSS for everything from training data/code to model weights?

https://salsa.debian.org/deeplearning-team/ml-policy

newman314 - 3 days ago

Does Whispering support semantic correction? I was unable to find confirmation while doing a quick search.

hn1986 - 2 days ago

excellent tool and easy to get started.

on win11, i installed ffmpeg using winget but it's not detecting it. running ffmpeg -version works but the app doesn't detect it.

one thing, how can we reduce the number of notifications received?

i like the system prompt option too.

satvikpendem - 2 days ago

All these all just Whisper wrappers? I don't get it, the underlying model still isn't as good as paid custom models from companies, is there an actual open source / weights alternative to Whisper for speech to text? I know only of Parakeet.

emacsen - 2 days ago

Tried it with AppImage on Linux, attempted to download a model and "Failed to download model. An error occurred." but nothing that helps me track down the error :(

hn_throw2025 - 2 days ago

Thanks, looks like great work! Hope you continue to cater for those of us with Intel Macs who need the off-device capability…

danmeier - 2 days ago

Hot take: I think all these dictation tools are solving the wrong problem: they're optimizing for accurate transcription (and latency) when users actually need intelligent interpretation. For example: People don't speak in perfect emails. They speak in scattered thoughts and intentions that require contextual understanding.

dllthomas - 2 days ago

Can it tell voices apart?

hereme888 - 2 days ago

Earlier today I discovered Vibe: https://github.com/thewh1teagle/vibe

Local, using WhisperX. Precompiled binaries available.

I'm hoping to find and try a local-first version of an nvidia/canary like (like https://huggingface.co/nvidia/canary-qwen-2.5b) since it's almost twice as fast as Whisper with even lower word-error-rate

teiferer - a day ago

Reposting here, maybe you missed it where I asked first:

You mentioned that you got into YC .. what is the road to profitability for your project(s) if everything is open source and local?

ideashower - 2 days ago

Is there speaker detection?

random3 - 3 days ago

are there any non-Whisper-based voice models/tech/APIs?

jokethrowaway - 2 days ago

I recommend adding support for nemo parakeet.

It's uncanny how good / fast it is

Jarwain - 2 days ago

Yes yes yes please so much yes.

I love the idea of epicenter. I love open source local-first software.

Something I've been hacking on for a minute would fit so well, if encryption wasn't a requirement for the profit model.

But uh yes thank you for making my life easier, and I hope to return the favor soon

blueboo - 2 days ago

I used Whispering routinely last year; the value + glitches and ux failures drove me to gladly pay for Superwhisper; whose rough iPhone keyboard drove me to Wispr Flow (and tried otter too); whose poor transcriptions (oh THATS why they’re fast) drove me back to Superwhisper

Still lots of quality headroom in this space. I’ll def revisit whispering

okasaki - 2 days ago

This is a cool project and I want go give it a go in my spare time.

However what gives me pause is the sheer number of possibly compromised microphones all around me (phones, tablets, laptops, tv etc) at all times, which makes spying much easier than if I use a keyboard.

satisfice - 3 days ago

Windows Defender says it is infected.

codybontecou - 3 days ago

Now we just need text to speech so we can truly interact with our computers hands free.

jannniii - 2 days ago

[dead]