I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorch

github.com

423 points by yousef_g a day ago


liuliu - 21 hours ago

If you are interested in this: Flux reference implementation is very minimalistic: https://github.com/black-forest-labs/flux/tree/main/src/flux

The minRF project is very easy to start with training small diffusion models with rectified flow: https://github.com/cloneofsimo/minRF

Also, the reference implementation of SD 3.5 is actually minimalistic too: https://github.com/Stability-AI/sd3-ref

reedlaw - a day ago

I'm not sure what this means. If it means the Stable Diffusion 3.5 model, why is it fetching that here: https://github.com/yousef-rafat/miniDiffusion/blob/main/enco...

The training dataset is very small, only including fashion-related pictures: https://github.com/yousef-rafat/miniDiffusion/tree/main/data...

Dwedit - 15 hours ago

Does using pure PyTorch improve performance on non-NVIDIA cards in any way? Or is PyTorch so highly optimized for CUDA that no other GPU vendors have a chance?

albert_e - a day ago

Sounds like a great resources for learners.

Just wondering aloud --

Is there a tutorial/explainer by any chance that a beginner could use to follow along and learn how this is done.

godelski - 14 hours ago

       self.q = nn.Linear(embed_size, embed_size, bias = False)
       self.k = nn.Linear(embed_size, embed_size, bias = False)
       self.v = nn.Linear(embed_size, embed_size, bias = False)
Try

       self.qkv = nn.Linear(embed_size, 3*embed_size, bias = False)

    def forward(...):
       ...
       qkv = self.qkv(x)
NoelJacob - 21 hours ago

So, that's Stable Diffusion without license constraints, is it?

ineedasername - 16 hours ago

When I think of SD 3.5 (or any version) I think of the portion that results from training, i.e., the weights. The code seems less important? I mean as far as output quality is concerned, or performance. But I'm honestly not sure, and not trying to judge these efforts on that basis.

refulgentis - 18 hours ago

I'm embarrassed to ask: can someone elaborate on, say, what we have now that we didn't have before the repo existed?

I have studiously avoided making models, though I've been adjacent to their output for years now... I think the root of my confusion is I kinda assumed there was already PyTorch based scripts for inference / training. (I assumed _at least_ inference scripts were released with models, and kinda figured fine-tuning / training ones were too)

So then I'm not sure if I'm just looking at a clean room / dirty room rewrite of those. Or maybe everyone is using "PyTorch" but it's usually calling into CUDA/C/some proprietary thingy that is much harder to grok than a pure PyTorch impl?

Anyways, these arent great guesses, so I'll stop myself here. :)

squircle - a day ago

Although I'm leaning heavily away from being passionate about software development, this is a cool project, and its freaken awesome how anyone can now reinvent the wheel from first principles.

b0a04gl - a day ago

does the DiT here actually capture cross-token attention the same way as full SD 3.5 or is it simplified for clarity?

caycep - 21 hours ago

How usable is the original academic source available from Ludwig Maximilian University CompViz group?

vergessenmir - 21 hours ago

Is there any notable properties of this implementation, are some parts slower, faster etc

- a day ago
[deleted]
eapriv - 20 hours ago

I find it hilarious that “from scratch” now somehow means “in PyTorch”.

nothrowaways - 11 hours ago

Pure pytorch?

CamperBob2 - a day ago

Add a Hugging Face Token in get_checkpoints.py before running the script.

Can you be a bit more specific here? It's not clear what such a token is, what it takes to get one, or where it would be placed in get_checkpoints.py.

hkon - 18 hours ago

now do it in minecraft

SV_BubbleTime - 14 hours ago

[flagged]

theturtle - a day ago

Cool. Can it still make images of Anne Hathaway leading a herd of blue giraffes on the Moon?