Our New Sam Audio Model Transforms Audio Editing

about.fb.com

162 points by ushakov 7 days ago


yunwal - a day ago

This is hilariously bad with music. Like I can type in the most basic thing like "string instruments" which should theoretically be super easy to isolate. You can generally one-shot this using spectral analysis libraries. And it just totally fails.

ks2048 - a day ago

I recently discovered Audacity includes plug-ins for audio separation that work great (e.g. split into vocals track and instruments track). The model it uses also originated at Facebook (demucs).

keepamovin - 13 hours ago

FB has been a pioneer in voice and audio, somehow. A couple of years ago FB-Research had a little repo on GitHub that was the best noise-removal / voice-isolation out there. I wanted to use it in Wisprnote and politely emailed the authors. Never heard back (that's okay), but I was so impressed with the perceptual quality and "wind removal" (so hard).

yjftsjthsd-h - a day ago

> Visual prompting: Click on the person or object in the video that’s making a sound to isolate their audio.

How does that work? Correlating sound with movement?

websiteapi - 7 hours ago

I wonder if it works for speaker diarization out of the box. I've found that open source speaker diarization that doesn't require a lot of tweaking is basically non-existent.

Oras - 13 hours ago

To try: https://aidemos.meta.com/segment-anything/editor/segment-aud...

Github: https://github.com/facebookresearch/sam-audio

I quite like adding effects such as making the isolated speech studio-quality or broadcast-ready.

AkshatJ27 - 16 hours ago

You can try it out in the playground: https://aidemos.meta.com/segment-anything/gallery/ There seem to be many more fun little demos by meta here like automatic video masking, making 3d models from 2d images, etc.

throwaw12 - 18 hours ago

This is super cool. Of course, it is possible to separate instrument sounds using specialized tools, but can't wait to see how people use this model for bunch of other use cases, where its not trivial to use those specialized tools:

* remove background noise of tech products, but keep the nature

* isolate the voice of a single person and feed into STT model to improve accuracy

* isolating sound of events in games and many more

samuell - 16 hours ago

I tried this to try to extract some speech from an audio track with heavy noise from wind (filmed out on a windy sea shore without mic windscreen), and the result unfortunately was less intelligible than the original.

I got much better results, though still not perfect, with the voice isolator in ElevenLabs.

teeray - a day ago

I wonder if this would be nice for hearing aid users for reducing the background restaurant babble that overwhelms the people you want to hear.

ajcp - a day ago

Given TikToks insane creator adoption rate is Meta developing these models to build out a content creation platform to compete?

ac2u - a day ago

I wonder if the segmentation would work with a video of a ventriloquist and a dummy?

7734128 - 18 hours ago

Finally a way to perhaps remove laugh tracks in the near future.

theflyestpilot - 13 hours ago

sample anything model?

m3kw9 - a day ago

Can I create a continuous “who farted” detector? Would be great at parties