Strengths and limitations of diffusion language models

seangoedecke.com

70 points by rbanffy 2 days ago


billconan - 2 days ago

I'm curious, in image generation, flow matching is said to be better than diffusion, then why do these language models still start from diffusion, instead of jumping to flow matching directly?

mountainriver - 2 days ago

A big discussion on this happened here as well https://news.ycombinator.com/item?id=44057820

There is quite a bit of evidence diffusion models work better at reasoning because they don't suffer from early token bias.

https://github.com/HKUNLP/diffusion-vs-ar https://arxiv.org/html/2410.14157v3

accrual - 2 days ago

Great overview. I wonder if we'll start to see more text diffusion models from other players, or maybe even a mixture of diffusion and transformer models alternating roles behind a single UI, depending on the context and request.

cubefox - 2 days ago

That's a nice explanation. I wonder whether autoregressive and diffusion language models could be combined such that the model only denoises the (most recent) end of a sequence of text, like a paragraph, while the rest is unchangeable and allows for key-value caching.