Flux 2 Klein pure C inference

github.com

364 points by antirez 19 hours ago


antirez - 18 hours ago

Something that may be interesting for the reader of this thread: this project was possible only once I started to tell Opus that it needed to take a file with all the implementation notes, and also accumulating all the things we discovered during the development process. And also, the file had clear instructions to be taken updated, and to be processed ASAP after context compaction. This kinda enabled Opus to do such a big coding task in a reasonable amount of time without loosing track. Check the file IMPLEMENTATION_NOTES.md in the GitHub repo for more info.

neomantra - 18 hours ago

Thanks for sharing this — I appreciate your motivation in the README.

One suggestion, which I have been trying to do myself, is to include a PROMPTS.md file. Since your purpose is sharing and educating, it helps others see what approaches an experienced developer is using, even if you are just figuring it out.

One can use a Claude hook to maintain this deterministically. I instruct in AGENTS.md that they can read but not write it. It’s also been helpful for jumping between LLMs, to give them some background on what you’ve been doing.

adefa - 18 hours ago

I ran a similar experiment last month and ported Qwen 3 Omni to llama cpp. I was able to get GGUF conversion, quantization, and all input and output modalities working in less than a week. I submitted the work as a PR to the codebase and understandably, it was rejected.

https://github.com/ggml-org/llama.cpp/pull/18404

https://huggingface.co/TrevorJS/Qwen3-Omni-30B-A3B-GGUF

d_watt - 18 hours ago

Regarding the meta experiment of using LLMs to transpile to a different language, how did you feel about the outcome / process, and would you do the same process again in the future?

I've had some moments recently for my own projects as I worked through some bottle necks where I took a whole section of a project and said "rewrite in rust" to Claude and had massive speedups with a 0 shot rewrite, most recently some video recovery programs, but I then had an output product I wouldn't feel comfortable vouching for outside of my homelab setup.

kristianp - 11 hours ago

Note that the original FLUX.2 [klein] model [1] and python code was only released about 3 days ago (inexact without knowing the times and timezones involved.) Discussed at [2]

[1] https://bfl.ai/blog/flux2-klein-towards-interactive-visual-i...

[2] https://news.ycombinator.com/item?id=46653721

jabedude - 14 hours ago

Salvatore, how did you pick up the requisite background knowledge on this subject? IIRC this is your first OSS project in the ML domain, just curious if/how much Claude was helpful with providing you with domain expertise while building this engine

imranq - 9 hours ago

Just because it is in C, doesn't mean you will get C like performance. Just look at the benchmarks, it is 8x slower than just using PyTorch... while I get its cool to use LLMs to generate code at this level, getting super high performing optimized code is very much out of the domain of current frontier LLMs

throwaway2027 - 18 hours ago

If I asked Claude to do the same can I also just put MIT license on it with my name? https://github.com/black-forest-labs/flux2 uses Apache License apparently. I know it doesn't matter that much and as long as it's permissive and openly available people don't care it's just pedantics but still.

khimaros - 12 hours ago

https://github.com/leejet/stable-diffusion.cpp

csto12 - 18 hours ago

As someone who doesn’t code in C and does more analytics work (SQL), is the code generated here “production grade?” One of the major criticisms I hear about llms is they tend to generate code that you wouldn’t want to maintain, is that the case here?

mmastrac - 14 hours ago

Is it just my connection or is the huggingface downloader completely broken? It was saturating my internet connection without making any progress whatsoever.

EDIT: https://github.com/bodaay/HuggingFaceModelDownloader seems to be making progress.

fulafel - 8 hours ago

Interesting that OpenBLAS and MPS are reportedly nearly the same speed although the README sounds like only MPS uses the GPU.

yunnpp - 16 hours ago

> I believe that inference systems not using the Python stack (which I do not appreciate) are a way to free open models usage and make AI more accessible.

What you're saying here is that you do not appreciate systems not using the Python stack, which I think is the opposite of what you wanted to say.

abecedarius - 14 hours ago

A suggestion born of experience: besides printing the seed for an image, add it to the image file as metadata. Otherwise, if you're me, you'll lose it.

bakkoting - 13 hours ago

Neat! I wonder how slow this would be running in wasm. In my dream world it would also use WebGPU but that's a much bigger lift.

falloutx - 15 hours ago

I dont understand, so its just to generate the pic using a model. Isn't that trivial, whats the advantage of doing it in C? Is the model running in C? Readme is overly verbose and It seems like a project that just does one task and it costed the author $80.

reactordev - 19 hours ago

This is both awesome and scary. Yes, now we can embed image gen in things like game engines and photoshop or build our own apps. On the other hand, we can include image gen in anything…

kurtis_reed - 9 hours ago

Confusing to have project name same as file name

filipstrand - 11 hours ago

Really cool project! Impressive to see this being done in pure C.

I'm the maintainer of MFLUX (https://github.com/filipstrand/mflux) which does a similar thing, but at a higher level using the MLX framework optimised for Apple Silicon. I just merged Flux 2 Klein support as well and was happy to see this discussion :)

I started out doing this type of work roughly 1.5 years ago when FLUX.1 was released and have been doing it off and on ever since with newer models, trying to use more and more AI over time.

At one point, I vibe-coded a debugger to help the coding agent along. It worked OK but as models have gotten stronger, this doesn't really seem to be needed anymore. My latest version simply has a SKILL.md that outlines my overall porting strategy (https://github.com/filipstrand/mflux/blob/main/.cursor/skill...). Somewhat surprisingly, this actually works now with Cursor + Codex-5.2, with little human intervention.

> Even if the code was generated using AI, my help in steering towards the right design, implementation choices, and correctness has been vital during the development.

This definitely resonates! Curious to hear more about what worked/didn't for you. A couple of things I've found useful:

- Porting the pipeline backwards: This is the way I did it personally before using any coding models. The typical image generation flow is the following:

1.Text_encodings (+ random_noise_latent) 2.Transformer loop 3.VAE decoding

I found that by starting with the VAE first (by feeding it pre-loaded tensors from the reference extracted at specific locations) it was the quickest way to get to an actual generated image. Once the VAE is done and verified, only then proceed backwards the chain and handle the Transformer, etc. I still prefer to do it this way and I like to manually intervene between step 3,2 and 1, but maybe this won't actually be needed soon?

- Also, with the VAE, if you care about implementing the encoding functionality (e.g to be used with img2img features), the round-trip test is a very quick way to verify correctness:

image_in -> encode -> decode -> image_out : compare(image_in, image_out)

- Investing in a good foundation for weight handling, especially when doing repeat work across similar models. Earlier coding models would easily get confused about weight assignment, naming conventions etc. A lot of time could be wasted because weight assignment failed (sometimes silently) early on.

treksis - 16 hours ago

how fast is this compare to python based?

holografix - 16 hours ago

No cuBLAS?

ChrisArchitect - 16 hours ago

Related:

FLUX.2 [Klein]: Towards Interactive Visual Intelligence

https://news.ycombinator.com/item?id=46653721

cboyardee - 9 hours ago

[dead]

DamianLewis - 10 hours ago

[dead]

llmidiot - 16 hours ago

I supported Redis against Valkey because I felt software should not be appropriated like that.

Now that the Redis author supports broad copyright violations and has turned into an LLM influencer, I regret having ever supported Redis. I have watched many open source authors, who have positioned themselves as rebels and open source populists, go fully corporate. This is the latest instance.

re - 15 hours ago

> I wanted to see if, with the assistance of modern AI, I could reproduce this work in a more concise way, from scratch, in a weekend.

I don't think it counts as recreating a project "from scratch" if the model that you're using was trained against it. Claude Opus 4.5 is aware of the stable-diffusion.cpp project and can answer some questions about it and its code-base (with mixed accuracy) with web search turned off.