Yt-transcriber – Give a YouTube URL and get a transcription

github.com

172 points by Bluestein 2 days ago


paulirish - 2 days ago

Can also just fetch the subs already in YouTube rather than retranscribing. eg:

yt-dlp --write-auto-subs --skip-download "https://www.youtube.com/watch?v=7xTGNNLPyMI"

MysticOracle - 2 days ago

For (English only) speech-to-text, NVIDIA's Parakeet-V2 is significantly faster than Whisper and I found it to be more accurate.

https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2

For Apple Silicon (MLX) https://huggingface.co/senstella/parakeet-tdt-0.6b-v2-mlx

0points - 2 days ago

Youtube already offers AI transcriptions on their site. As another commenter points out, you grab them with yt-dlp.

And unlike how your tool will be supported in the future, thousands of users make sure yt-dlp keeps working as google keep changing the site (currently 1459 contributors).

totallynotryan - 2 days ago

Hey all, I built a 100% free (no-signup) youtube summarizer: "https://youtube-summarizer-lime.vercel.app/". Accurate summaries in under 8 seconds.

eigenvalue - 2 days ago

I made a tool like this a while ago which was useful for transcribing a whole playlist automatically using whisper:

https://github.com/Dicklesworthstone/bulk_transcribe_youtube...

I ended up turning a beefed up version of it which makes polished written documents from the raw transcript, you can try it at

https://youtubetranscriptoptimizer.com/

Leftium - 2 days ago

Two similar Show HN projects:

- This python one is more amenable to modding into your own custom tool: https://hw.leftium.com/#/item/44353447

- Another bash script: https://hw.leftium.com/#/item/41473379

---

They all seem to be built on top of:

- yt-dlp to download video

- whisper for transcription

- ffmpeg for audio/video extraction/processing

yunusabd - 2 days ago

I tried it on a M1 Pro MBP using Docker. It's quite slow (no MPS) and there are no timestamps in the resulting transcript. But the basics are there. Truncated output:

  Fetching video metadata...
  Downloading from YouTube...
  Generating transcript using medium model...

  === System Information ===
  CPU Cores: 10
  CPU Threads: 10
  Memory: 15.8GB
  PyTorch version: 2.7.1+cpu
  PyTorch CUDA available: False
  MPS available: False
  MPS built: False
  
  Falling back to CPU only
  Model stored in: /home/app/.cache/whisper
  Loading medium model into CPU...
  100%|| 1.42G/1.42G [02:05<00:00, 12.2MiB/s]
  Model loaded, transcribing...
  Model size: 1457.2MB
  Transcription completed in 468.70 seconds
  === Video Metadata ===
  Title: 厨师长教你:“酱油炒饭”的家常做法,里面满满的小技巧,包你学会炒饭的最香做法,粒粒分明!
  Channel: Chef Wang 美食作家王刚
  Upload Date: 20190918
  Duration: 5:41
  URL: https://www.youtube.com/watch?v=1Q-5eIBfBDQ
  === Transcript ===
  
  哈喽大家好我是王刚本期视频我跟大家分享...
isubkhankulov - 2 days ago

I’ve been using this free tool. It gives quality diarized transcripts https://contentflow.megalabs.co

cmaury - 2 days ago

Thanks for sharing. This is exactly the type of utility that vibecoding is for. It takes 5 secons to ask GPT to write a scripr to do this tailored to your specific use case. It's way faster than trying to get someone elses repo up and running.

labrador - 2 days ago

Many channels I follow, such as Vlad Vexler, have taken measures so you can't download the transcript with yt-dlp. Furthermore, they don't provide a transcipt option on their videos. I assume this is to prevent people from just reading AI summaries, which is annoying in Vexler's case because he talks slowly and meanders around. If I really want to hear his point but don't want to listen to that then I download the video with yt-dlp and use Whisper to transcribe it.

dudeWithAMood - 2 days ago

I did something similar piping the output of the youtube-transcript-api python package to openAI's api: https://github.com/DavidZirinsky/tl-dw/

toddmorey - 2 days ago

Always fascinated to read CLAUDE.md files that are appearing in more and more open source projects: https://github.com/pmarreck/yt-transcriber/blob/yolo/CLAUDE....

I'd be really curious to see some sort of benchmark / evaluation of these context resources against the same coding tasks. Right now, the instructions all sound so prescriptive and authoritative, yet is really hard to evaluation their effectiveness.

mikeve - 2 days ago

Interesting project! I've been working on a project in this space myself (WaveMemo)

I must say, speaker diarization is surprisingly tricky to do. The most common approach seems to be to use pyannote, but the quality is not amazing...

lpeancovschi - 2 days ago

Youtube's T&C don't allow downloading youtube audio/video. How do other services get away with it?

arkaic - 2 days ago

On this note, is Ytube also the best transcriber of foreign languages or is there something better?

- 2 days ago
[deleted]
manishsharan - 2 days ago

Will this make Google mad at me and cancel/freeze all my Google services ?

- 2 days ago
[deleted]
senko - 2 days ago

I vibecoded something similar for myself, transcribes and summarizes the content into article format: https://github.com/senko/scribe

Uses yt-dlp, whisper, and a LLM (Gemini hardcoded because it handles long contexts well, but easy to switch) for summarizer.

I dislike podcast as a format (S/N level way too low for my taste), so use this whenever I want to get a tldr of some episode.

I should check out the SOTA models and improve the summarization prompt, but aren't in a hurry as this works pretty well for my needs already.

yufhg - 2 days ago

[dead]