TREAD: Token Routing for Efficient Architecture-Agnostic Diffusion Training

arxiv.org

37 points by fzliu 3 days ago


platers - 3 days ago

I'm struggling to understand where the gains are coming from. What is the intuition for why DiT training was so inefficient?

earthnail - 3 days ago

Wow, Ommer’s students never fail to impress. 37x faster for a generic architecture, ie no domain specific tricks. Insane.

arjvik - 3 days ago

Isn't this just Mixture-of-Depths but for DiTs?

If so, what are the DiT specific changes that needed to be made?

lucidrains - 3 days ago

very nice, will have to try it out! this is the same research group from which Robin Rombach (of stable diffusion fame) originated from