Consistency diffusion language models: Up to 14x faster, no quality loss

together.ai

157 points by zagwdt 10 hours ago


MASNeo - 7 hours ago

I wish there would be more of this research to speed things up rather than building ever larger models

- 30 minutes ago
[deleted]
yjftsjthsd-h - 8 hours ago

Is anyone doing any form of diffusion language models that are actually practical to run today on the actual machine under my desk? There's loads of more "traditional" .gguf options (well, quants) that are practical even on shockingly weak hardware, and I've been seeing things that give me hope that diffusion is the next step forward, but so far it's all been early research prototypes.

LarsDu88 - 5 hours ago

A lot of this post-training recipe feels reminiscent of DINO training (teacher/student, use of stop gradients). I wonder if the more recent leJEPA SigREG regularization research might be relevant here for simpler post-training.

simonw - 5 hours ago

I'd love to know what's going on with the Gemini Diffusion model - they had a preview last May and it was crazy fast but I've not heard anything since then.

bjt12345 - 6 hours ago

I do wonder why diffusion models aren't used alongside constraint decoding for programming - surely it makes better sense then using an auto-regressive model.

LarsDu88 - 8 hours ago

Google is working on a similar line of research. Wonder why they haven't rolled out a GPT40 scaled version of this yet

WiSaGaN - 5 hours ago

I think diffusion makes much more sense than auto-regressive (AR) specifically in code generation comparing to chatbot.

nl - 6 hours ago

Releasing this on the same day as Taalas's 16,000 token-per-second acceleration for the roughly comparable Llama 8B model must hurt!

I wonder how far down they can scale a diffusion LM? I've been playing with in-browser models, and the speed is painful.

https://taalas.com/products/

hanifbbz - 6 hours ago

Is this available as open source anywhere to try?

cubefox - an hour ago

This doesn't mention the drawback of diffusion language models, the main reason why nobody is using them: they have significantly lower performance on benchmarks than autoregressive models at similar size.

refulgentis - 9 hours ago

If this means there’s a 2x-7x speed up available to a scaled diffusion model like Inception Mercury, that’ll be a game changer. It feels 10x faster already…

LoganDark - 6 hours ago

Can't wait for the day I can actually try a diffusion model on my own machine (128GB M4 Max) rather than as a hosted service. So far I haven't seen a single piece of software that supports it.