Qwen-Image-2.0: Professional infographics, exquisite photorealism

238 points by meetpateltech 8 hours ago

I've seen many comments describing the "horse riding man" example as extremely bizarre (which it actually is), so I'd like to provide some background context here. The "horse riding man" is a Chinese internet meme originating from an entertainment awards ceremony, when the renowned host Tsai Kang-yong wore an elaborate outfit featuring a horse riding on his back[1]. At the time, he was embroiled in a rumor about his unpublicized homosexual partner, whose name sounded "Ma Qi Ren" which coincidentally translates to "horse riding man" in Mandarin. This incident spread widely across Chinese internet and turned into a meme. So they used "horse riding man" as an example isn't entirely nonsensical, though the image per se is undeniably bizarre and carries an unsettling vibe.

[1] The photo of the outfit: https://share.google/mHJbchlsTNJ771yBa

vessenes - 2 hours ago

Interesting background! Prompts like this also test the latent space of the image generator - it’s usually the other way round, so if you see a man on top of a horse, you’ve got a less sophisticated embedding feeding the model. In this case, though, that’s quite an image to put out to the interwebs. I looked to see what gender the horse was.
EDIT: After reading the prompt translation, this was more just like a “year of the horse is going to nail white engineers in glorious rendered detail” sort of prompt. I don’t know how SD1.5 would have rendered it, and I think I’ll skip finding out
Lerc - 3 hours ago

On the topic of modern Chinese culture, Is there the same hostility towards AI generated Imagery in China as there seems to be in America?
For example I think there would be a lot of businesses in the US that would be too afraid of backlash to use AI generated imagery for an itinerary like the one at https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/Qwe...
- tianqi - 3 hours ago
  
  Since China has a population of 1.4 billion people with vastly differing levels of cognition, I find it difficult to claim I can summarize "modern Chinese culture". But within my range of observation, no. Chinese not only have no hostility toward AI but actively pursues and reveres it with fervor. They widely perceive AI as an advanced force, a new opportunity for everyone, a new avenue for making money, and a new chance to surpass others. At most, some of the consumers might associate businesses using AI generated content with a budget-conscious brand image, but not hostile.
  - Lerc - 2 hours ago
    
    >Since China has a population of 1.4 billion people with vastly differing levels of cognition, I find it difficult to claim I can summarize "modern Chinese culture"
    Ha! An American would have no such qualms.
- yorwba - 39 minutes ago
  
  There's definitely some hostility: https://mp.weixin.qq.com/s/A5shO-6nZIXZvJUEzrx03Q
yorwba - 5 hours ago

There's also the "horse riding astronaut" challenge in image generation: https://garymarcus.substack.com/p/horse-rides-astronaut-redu...
cubefox - 18 minutes ago

While I don't doubt this was one influence, there was also an infamous problem with Dall-E 2, which was perfectly able to generate an astronaut riding a horse but completely unable to generate a horse riding an astronaut.
This problem is infamous because it persisted (unlike other early problems, like creating the wrong number of fingers) for much more capable models, and the Qwen Image people are certainly very aware of this difficult test. Even Imagen 4 Ultra, which might be the most advanced pure diffusion model without editing loop, fails at it.
And obviously an astronaut is similar to a man, which connects this benchmark to the Chinese meme.
badhorseman - 6 hours ago

Why not ask for simply a man or even an Han man given the race of Tsai Kang-yong. Why a white man and why a man wearing medieval clothing. Gives your head a wobble.
- DustinEchoes - 3 hours ago
  
  Yep, it’s the only image on the entire page with a non-Chinese person in it. Given the prompt, the message is clear.
  - yorwba - 20 minutes ago
    
    The message is "We watched Lord of the Rings and Game of Thrones and liked the medieval aesthetic enough to emulate it."
- - 5 hours ago
  
  [deleted]

vunderba - 2 hours ago

Couple of thoughts:

1. I’d wager that given their previous release history, this will be open‑weight within 3-4 weeks.

2. It looks like they’re following suit with other models like Z-Image Turbo (6B parameters) and Flux.2 Klein (9B parameters), aiming to release models that can run on much more modest GPUs. For reference, the original Qwen-Image is a 20B-parameter model.

3. This is a unified model (both image generation and editing), so there’s no need to keep separate Qwen-Image and Qwen-Edit models around.

4. The original Qwen-Image scored the highest among local models for image editing in my GenAI Showdown (6 out of 12 points), and it also ranked very highly for image generation (4 out of 12 points).

Generative Comparisons of Local Models:

https://genai-showdown.specr.net/?models=fd,hd,kd,qi,f2d,zt

Editing Comparison of Local Models:

https://genai-showdown.specr.net/image-editing?models=kxd,og...

I'll probably be waiting until the local version drops before adding Qwen-Image-2 to the site.

SV_BubbleTime - 30 minutes ago

For the more technical…
Qwen 2512 (December edition of Qwen Image)
* 19B parameters, which was a 40GB file at FP16 and fit on a 3090 at FP8. Anything less than that and you were in GGUF format at Q6 to Q4 quantizations… which were slow, but still good quality.
* used Qwen 2.5 VL. So a large model and a very good vision model.
* And iirc, their own VAE. Which had known and obvious issues of high frequency artifacts. Some people would take the image and pass it through another VAE like WAN Video model’s or upscale-downscale to remove these
Qwen 2 now is
* a 7B param model. Right between Klein 9B (non-commercial) this (license unknown), Z-Image 7B (Apache), and Klein 4B (Apache). Direct competition, will fit on many more GPUs even at FP16.
* upgrades to Qwen 3 VL, I assume this is better than the already great 2.5 VL.
* Unknown on the new VAE. Flux2’s new 128 channel VAE is excellent, but it hasn’t been out long enough for even a frontier Chinese model to pick up.
Overall, you’re right this is on the trend to bring models on to lower end hardware.
Qwen was already excellent and now they rolled Image and Edit together for an “Omni” model.
Z-Image was the model to beat a couple weeks ago… and now it looks like both Klein and Qwen will! Z-Image has been disappointing to see how it just refuses to adhere to multiple new training concepts. Maybe they tried to pack it too tightly.
Open weights for this will be amazing. THREE direct competitors all vying to be “SDXL2” at the same time.
The Qwen convention was confusing! You had Image, 2509, Edit, 2511 (Edit), 2512 (Image) and then the Lora compatibility was unspecified. It’s smart to just 2.0 this mess.
- vunderba - 7 minutes ago
  
  Agreed! A lot of people were also using ZiT as a refiner downstream to help with some of the more problematic visual aspects of the original Qwen-Image.
  I'm really looking forward to running the unified model through its paces.
- liuliu - 18 minutes ago
  
  Note that Qwen Image 1.0 (2512) wasted ~8B weights on timestep embedding. Both Z-Image / FLUX.2 series corrected that.

raincole - 8 hours ago

It's crazy to think there was a fleeting sliver of time during which Midjourney felt like the pinnacle of image generation.

gamma-interface - 2 hours ago

The pace of commoditization in image generation is wild. Every 3-4 months the SOTA shifts, and last quarter's breakthrough becomes a commodity API.
What's interesting is that the bottleneck is no longer the model — it's the person directing it. Knowing what to ask for and recognizing when the output is good enough matters more than which model you use. Same pattern we're seeing in code generation.
- SV_BubbleTime - 23 minutes ago
  
  SOTA shifts, yes. But the average person doing the work has been very happy with SDXL based models. And that was released two years ago.
  The fight right now outside of API SOTA is who will replace SDXL to be the “community preference”
  It’s now a three way between Flux2 Klein, Z-Image, and now Qwen2.
Mashimo - 7 hours ago

What ever happend to midjourney?
- Lalabadie - 3 hours ago
  
  No external funding raised. They're not on the VC path, so no need to chase insane growth. They still have around 500M USD in ARR.
  In my (very personal) opinion, they're part of a very small group of organizations that sell inference under a sane and successful business model.
  - spaceman_2020 - 28 minutes ago
    
    Aesthetically, still unmatched
- wongarsu - 7 hours ago
  
  They have image and video models that are nowhere near SOTA on prompt adherence or image editing but pretty good on the artistic side. They lean in on features like reference images so objects or characters have a consistent look, biasing the model towards your style preferences, or using moodboards to generate a consistent style
- vunderba - 2 hours ago
  
  A lot of people started realizing that it didn’t really matter how pretty the resulting image was if it completely failed to adhere to the prompt.
  Even something like Flux.1 Dev which can be run entirely locally and was released back in August of 2024 has significantly better prompt understanding.
- raincole - 7 hours ago
  
  Not much, while everything happened at OpenAI/Google/Chinese companies. And that's the problem.
  - KeplerBoy - 7 hours ago
    
    How is it a problem? There simply doesn't seem to be a moat or secret sauce. Who cares which of these models is SOTA? In two months there will be a new model.
    
    waldarbeiter - 6 hours ago
    
    There seems to be a moat like infrastructure/gpus and talent. The best models right now come from companies with considerable resources/funding.
    
    esperent - 5 hours ago
    
    Right, but that's a short term moat. If they pause on their incredible levels of spending for even 6 months, someone else will take over having spent only a tiny fraction of what they did. They might get taken over anyway.
    
    raincole - 4 hours ago
    
    > someone else will take over having spent only a tiny fraction of what they did
    How. By magic? You fell for 'Deepseek V3 is as good as SOTA'?
    
    Gud - 3 hours ago
    
    By reverse engineering, sheer stupidity from the competition, corporate espionage, ‘stealing’ engineers and sometimes a stroke of genius, the same as it’s always been
- qingcharles - an hour ago
  
  They still have a niche. Their style references feature is their key differentiator now, but I find I can usually just drop some images of a MJ style into Gemini and get it to give me a text prompt that works just as well as MJ srefs.

inanothertime - 7 hours ago

I recently tried out LMStudio on Linux for local models. So easy to use!

What Linux tools are you guys using for image generation models like Qwen's diffusion models, since LMStudio only supports text gen.

eurekin - 4 hours ago

Practically anybody actually creating with this class of models (diffusion based mostly) is using ComfyUI. Community takes care of quantization, repackaging into gguf (most popular) and even speed optimizing (lighting loras, layers skip). It's quite extensive
embedding-shape - 7 hours ago

Everything keeps changing so quickly, I basically have my own Python HTTP server with a unified JSON interface, then that can be routed to any of the impls/*.py files for the actual generation, then I have of those per implementation/architecture basically. Mostly using `diffusers` for the inference, which isn't the fastest, but tends to have the new model architectures much sooner than everyone else.
vunderba - 2 hours ago

I encourage everyone to at least try ComfyUI. It's come a long way in terms of user-friendliness particularly with all of the built-in Templates you can use.
sequence7 - 3 hours ago

If you're on an AMD platform Lemonade (https://lemonade-server.ai/) added image generation in version 9.2 (https://github.com/lemonade-sdk/lemonade/releases/tag/v9.2.0).
adammarples - 14 minutes ago

Stability matrix, it's a manager for models and uis and loras etc, very nice
SV_BubbleTime - 16 minutes ago

LMStudio is a low barrier to entry for LLMs, for sure. The lowest. Good software!
Other people gave you the right answer, ComfyUI. I’ll give you the more important why and how…
There is a huge effort of people to do everything but Comfy because of its intimidating barrier. It’s not that bad. Learn it once and be done. You won’t have to keep learning UI of the week endlessly.
The how, go to civitai. Find an image you like, drag and drop it into comfy. If it has a workflow attached, it will show you. Install any missing nodes they used. Click the loaders to point to your models instead of their models. Hit run and get the same or a similar image. You don’t need to know what any of the things do yet.
If for some reason that just does not work for you… Swarm UI, is a front end too comfy. You can change things and it will show you on the comfy side what they’re doing. It’s a gateway drug to learning comfy.
EDIT: most important thing no one will tell you out right… DO NOT FOR ANY REASON try and skip the VENV or miniconda virtual environment when using comfy! You must make a new and clean setup. You will never get the right python, torch, diffusers, driver, on your system install.
guai888 - 7 hours ago

ComfyUI is the best for stable diffusion
- embedding-shape - 5 hours ago
  
  FWIW you can use non-sd models in ComfyUI too, the ecosystem is pretty huge and supports most of the "mainstream" models, not only the stable diffusion ones, even video models and more too.
ilaksh - 7 hours ago

I have my own MIT licensed framework/UI: https://github.com/runvnc/mindroot. With Nano Banana via runvnc/googleimageedit
PaulKeeble - 6 hours ago

Ollama is working on adding image generation but its not here yet. We really do need something that can run a variety of models for images.
- embedding-shape - 5 hours ago
  
  Yeah, I'm guessing they were bound to leave behind the whole "Get up and running with large language models" mission sooner or later, which was their initial focus, as investors after 2-3 years start making you to start thinking about expansion and earning back the money.
  Sad state of affairs and seems they're enshittifying quicker than expected, but was always a question of when, not if.
Eisenstein - 2 hours ago

Koboldcpp has built in support for image models. Model search and download, one executable to run, UI, OpenAI API endpoint, llama.cpp endpoint, highly configurable. If you want to get up and running instantly, just pick a kcppt file and open that and it will download everything you need and load it for you.
Engine:
* https://github.com/LostRuins/koboldcpp/releases/latest/
Kcppt files:
* https://huggingface.co/koboldcpp/kcppt/tree/main

sandbach - 7 hours ago

The Chinese vertical typography is sadly a bit off. If punctuation marks are used at all, they should be the characters specifically designed for vertical text, like ︒(U+FE12 PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP).

wiether - 7 hours ago

I use gen-AI to produce images daily, but honestly the infographics are 99% terrible.

LinkedIn is filled with them now.

smcleod - 7 hours ago

To be fair it hasn't made LinkedIn any worse than it already was.
- nurettin - 7 hours ago
  
  To be fair, it is hard to make LinkedIn any worse.
  - embedding-shape - 5 hours ago
    
    I was gonna make a joke about "Wish granted, now Microsoft owns it" but then I remembered that they already do. Reality sometimes makes better jokes than what we can come up with.
mdrzn - 5 hours ago

Infographics and full presentations are a NanoBananaPro exclusive so far.
viraptor - 6 hours ago

Informatics are as bad as the author allows though. There's few people who could make or even describe a good infographic, so that's what we see in the results too.
usefulposter - 7 hours ago

Correct.
Much like the pointless ASCII diagrams in GitHub readmes (big rectangle with bullet points flows to another...), the diagrams are cognitive slurry.
See Gas Town for non-Qwen examples of how bad it can get:
https://news.ycombinator.com/item?id=46746045
(Not commenting on the other results of this model outside of diagramming.)
- viraptor - 6 hours ago
  
  > cognitive slurry
  Thank you for this phrase. I don't think that bad diagrams are limited to the AI in any way and this perfectly describes all "this didn't make things any clearer" cases.

fguerraz - 7 hours ago

I found the horse revenge-porn image at the end quite disturbing.

engcoach - 2 hours ago

It's the year of the horse in their zodiac. The (translated) prompt is wild:
""" A desolate grassland stretches into the distance, its ground dry and cracked. Fine dust is kicked up by vigorous activity, forming a faint grayish-brown mist in the low sky. Mid-ground, eye-level composition: A muscular, robust adult brown horse stands proudly, its forelegs heavily pressing between the shoulder blades and spine of a reclining man. Its hind legs are taut, its neck held high, its mane flying against the wind, its nostrils flared, and its eyes sharp and focused, exuding a primal sense of power. The subdued man is a white male, 30-40 years old, his face covered in dust and sweat, his short, messy dark brown hair plastered to his forehead, his thick beard slightly damp; he wears a badly worn, grey-green medieval-style robe, the fabric torn and stained with mud in several places, a thick hemp rope tied around his waist, and scratched ankle-high leather boots; his body is in a push-up position—his palms are pressed hard against the cracked, dry earth, his knuckles white, the veins in his arms bulging, his legs stretched straight back and taut, his toes digging into the ground, his entire torso trembling slightly from the weight. The background is a range of undulating grey-blue mountains, their outlines stark, their peaks hidden beneath a low-hanging, leaden-grey, cloudy sky. The thick clouds diffuse a soft, diffused light, which pours down naturally from the left front at a 45-degree angle, casting clear and voluminous shadows on the horse's belly, the back of the man's hands, and the cracked ground. The overall color scheme is strictly controlled within the earth tones: the horsehair is warm brown, the robe is a gradient of gray-green-brown, the soil is a mixture of ochre, dry yellow earth, and charcoal gray, the dust is light brownish-gray, and the sky is a transition from matte lead gray to cool gray with a faint glow at the bottom of the clouds. The image has a realistic, high-definition photographic quality, with extremely fine textures—you can see the sweat on the horse's neck, the wear and tear on the robe's warp and weft threads, the skin pores and stubble, the edges of the cracked soil, and the dust particles. The atmosphere is tense, primitive, and full of suffocating tension from a struggle of biological forces. """
embedding-shape - 7 hours ago

I think they call it "horse riding a human" which could have taken two very different directions, and the direction the model seems to have taken was the least worst of the two.
- wongarsu - 6 hours ago
  
  At first I thought it's a clever prompt because you see which direction the model takes it, and whether it "corrects" it to the more common "human riding a horse" similar to the full wine glass test.
  But if you translate the actual prompt the term riding doesn't even appear. The prompt describes the exact thing you see in excruciating detail.
  "... A muscular, robust adult brown horse standing proudly, its forelegs heavily pressing between the shoulder blades and spine of a reclining man ... and its eyes sharp and focused, exuding a primal sense of power. The subdued man is a white male, 30-40 years old, his face covered in dust and sweat ... his body is in a push-up position—his palms are pressed hard against the cracked, dry earth, his knuckles white, the veins in his arms bulging, his legs stretched straight back and taut, his toes digging into the ground, his entire torso trembling slightly from the weight ..."
  - embedding-shape - 5 hours ago
    
    > But if you translate the actual prompt the term riding doesn't even appear. The prompt describes the exact thing you see in excruciating detail.
    Yeah, as they go through their workflow earlier in the blog post, that prompt they share there seems to be generated by a different input, then that prompt is passed to the actual model. So the workflow is something like "User prompt input -> Expand input with LLMs -> Send expanded prompt to image model".
    So I think "human riding a horse" is the user prompt, which gets expanded to what they share in the post, which is what the model actually uses. This is also how they've presented all their previous image models, by passing user input through a LLM for "expansion" first.
    Seems poorly thought out not to make it 100% clear what the actual humanly-written prompt is though, not sure why they wouldn't share that upfront.
- chakintosh - 3 hours ago
  
  Is it related to "Mr Hands" ?
blitzar - 6 hours ago

Wont someone think of the horses.

engcoach - 2 hours ago

The "horse riding man" prompt is wild:

"""A desolate grassland stretches into the distance, its ground dry and cracked. Fine dust is kicked up by vigorous activity, forming a faint grayish-brown mist in the low sky. Mid-ground, eye-level composition: A muscular, robust adult brown horse stands proudly, its forelegs heavily pressing between the shoulder blades and spine of a reclining man. Its hind legs are taut, its neck held high, its mane flying against the wind, its nostrils flared, and its eyes sharp and focused, exuding a primal sense of power. The subdued man is a white male, 30-40 years old, his face covered in dust and sweat, his short, messy dark brown hair plastered to his forehead, his thick beard slightly damp; he wears a badly worn, grey-green medieval-style robe, the fabric torn and stained with mud in several places, a thick hemp rope tied around his waist, and scratched ankle-high leather boots; his body is in a push-up position—his palms are pressed hard against the cracked, dry earth, his knuckles white, the veins in his arms bulging, his legs stretched straight back and taut, his toes digging into the ground, his entire torso trembling slightly from the weight. The background is a range of undulating grey-blue mountains, their outlines stark, their peaks hidden beneath a low-hanging, leaden-grey, cloudy sky. The thick clouds diffuse a soft, diffused light, which pours down naturally from the left front at a 45-degree angle, casting clear and voluminous shadows on the horse's belly, the back of the man's hands, and the cracked ground. The overall color scheme is strictly controlled within the earth tones: the horsehair is warm brown, the robe is a gradient of gray-green-brown, the soil is a mixture of ochre, dry yellow earth, and charcoal gray, the dust is light brownish-gray, and the sky is a transition from matte lead gray to cool gray with a faint glow at the bottom of the clouds. The image has a realistic, high-definition photographic quality, with extremely fine textures—you can see the sweat on the horse's neck, the wear and tear on the robe's warp and weft threads, the skin pores and stubble, the edges of the cracked soil, and the dust particles. The atmosphere is tense, primitive, and full of suffocating tension from a struggle of biological forces."""

badhorseman - 2 hours ago

[dead]

dsrtslnd23 - 8 hours ago

unfortunately no open weights it seems.

embedding-shape - 5 hours ago

To be fair, didn't they release open weights image model only like a ~month ago? Think last one was in December 2025.
- vunderba - 2 hours ago
  
  Exactly - they did the same thing with the original version of Qwen-Image. It was API only for a while before being made available for local hosting.

engcoach - 2 hours ago

My response to the horse image: https://i.postimg.cc/hG8nJ4cv/IMG-5289-copy.jpg

ranger_danger - 40 minutes ago

When I tried Qwen-Image-2512 I could not even get it to spell correctly. And often the letters would be garbled anyways.

skerit - 7 hours ago

> Qwen-Image-2.0 not only accurately models the “riding” action but also meticulously renders the horse’s musculature and hair > https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/Qwe...

What the actual fuck

wongarsu - 7 hours ago

For reference, below is the prompt translated (with my highlighting of the part that matters). They did very much ask for this version of "horse riding a man", not the "horse sitting upright on a crawling human" version
---
A desolate grassland stretches into the distance, its ground dry and cracked. Fine dust is kicked up by vigorous activity, forming a faint grayish-brown mist in the low sky.
Mid-ground, eye-level composition: A muscular, robust adult brown horse stands proudly, its forelegs heavily pressing between the shoulder blades and spine of a reclining man. Its hind legs are taut, its neck held high, its mane flying against the wind, its nostrils flared, and its eyes sharp and focused, exuding a primal sense of power. The subdued man is a white male, 30-40 years old, his face covered in dust and sweat, his short, messy dark brown hair plastered to his forehead, his thick beard slightly damp; he wears a badly worn, grey-green medieval-style robe, the fabric torn and stained with mud in several places, a thick hemp rope tied around his waist, and scratched ankle-high leather boots; his body is in a push-up position—his palms are pressed hard against the cracked, dry earth, his knuckles white, the veins in his arms bulging, his legs stretched straight back and taut, his toes digging into the ground, his entire torso trembling slightly from the weight.
The background is a range of undulating grey-blue mountains, their outlines stark, their peaks hidden beneath a low-hanging, leaden-grey, cloudy sky. The thick clouds diffuse a soft, diffused light, which pours down naturally from the left front at a 45-degree angle, casting clear and voluminous shadows on the horse's belly, the back of the man's hands, and the cracked ground.
The overall color scheme is strictly controlled within the earth tones: the horsehair is warm brown, the robe is a gradient of gray-green-brown, the soil is a mixture of ochre, dry yellow earth, and charcoal gray, the dust is light brownish-gray, and the sky is a transition from matte lead gray to cool gray with a faint glow at the bottom of the clouds.
The image has a realistic, high-definition photographic quality, with extremely fine textures—you can see the sweat on the horse's neck, the wear and tear on the robe's warp and weft threads, the skin pores and stubble, the edges of the cracked soil, and the dust particles. The atmosphere is tense, primitive, and full of suffocating tension from a struggle of biological forces.
- badhorseman - 6 hours ago
  
  The significance of the hemp-rope is that it is symbol of morning and loss of ones decedent.
embedding-shape - 5 hours ago

I like how sometimes I get angry at a LLM for not understanding what I meant, but then I realize that I just forgot to mention it in the context. It's fun to see the same thing happen in humans reading websites too, where they don't understand the context yet react with strong feelings anyways.

cocodill - 7 hours ago

interesting riding application picture

rwmj - 7 hours ago

"Guy being humped by a horse" wouldn't have been my first choice for demoing the capabilities of the model, but each to their own I guess.
- viraptor - 6 hours ago
  
  It looks like a marketing move. It's a good quality, detailed picture. It's going to get shared a lot. I would assume they knew exactly what they were doing. Nothing like a bit of controversy for extra clicks.
  - brookst - 5 hours ago
    
    Because every ML researcher is a viral social media expert.
    (I don’t even know if I’m being sarcastic)
    
    viraptor - 4 hours ago
    
    This is not some random ML researcher doing fun things at home. Qwen is backed by Alibaba cloud. They likely have whole departments of marketing people available.

goga-piven - 7 hours ago

Why is the only image featuring non-Asian men the one under the horse?

z3dd - 7 hours ago

they explicitly called for that in the prompt
- goga-piven - 6 hours ago
  
  Exactly why did they choose this prompt with a white person and not an Asian person, as in all the other examples?
- wtcactus - 6 hours ago
  
  But why? That image actually puzzled me. Does it have some background context? Some historical legend or something of the like?
  - joeycodes - 6 hours ago
    
    It is Lunar New Year season right now, 2026 is year of the horse, there is celebratory horse imagery everywhere in many Asian countries right now, so this image could be interpreted as East trampling West. I have no way to know the intention of the person at Qwen who wrote this, but you can form your own conclusions from the prompt:
    A muscular, robust adult brown horse stands proudly, its forelegs heavily pressing between the shoulder blades and spine of a reclining man. Its hind legs are taut, its neck held high, its mane flying against the wind, its nostrils flared, and its eyes sharp and focused, exuding a primal sense of power. The subdued man is a white male...
    
    wtcactus - 6 hours ago
    
    So, it’s just racism, pure and simple.
  - badhorseman - 6 hours ago
    
    [dead]
andruby - 7 hours ago

Is the problem the position/horse or that Qwen mostly shows asian people?
Do western AI models mostly default to white people?
- goga-piven - 6 hours ago
  
  Well, what if some western models showcase white people in all good-looking images and the only embarrassing image features Asian people? wouldn't that be considered racism?
  - embedding-shape - 5 hours ago
    
    > and the only embarrassing image
    Embarrassing image? I'm white, why would I be embarrassed over that image? It's a computer generated image with no real people in it, how could it be embarrassing for alive humans?
    
    badhorseman - 5 hours ago
    
    I assume you feel the same about Nazi propaganda or Racist caricatures of black people. since it's not real and just a drawing.
    
    embedding-shape - 4 hours ago
    
    Yeah, why would I feel embarrassed over either of those things? I get angry when I see nazi propaganda, feel hopeless sometimes when I see racist caricatures, but never "embarrassed", that wouldn't make much sense. What would I be embarrassed about exactly?
    
    badhorseman - 3 hours ago
    
    Indeed if ones own race is not being denigrated one would not feel embarrassed, although one may be embarrassed that racist material was created by their people. If ones own race is being denigrated then one may indeed feel embarrassment and perhaps also the anger and hopelessness. As for why exactly embarrassment if the purpose is to degrade by pointing some reason why the author holds your people in contempt and you are indeed hopeless as to stop it, shame and embarrassment is often what is felt.
    In another post you talked about people getting mad at the image without context What context are we missing exactly. I do not feel ill informed or angry. But I could indeed be missing something, can you explain the context? If you where to say it's because of the LLM adding more context then that could be plausible, but why the medieval and hemp-rope? I know how sensitive the western companies have been on their models getting rid of negative racial stereo-types, going as far as to avoid and modify certain training data, would you accept an LLM producing negative stereotypes or tending to put one particular racial group into a submissive situation then others?
    I really do feel like the idea that the LLM would just take the prompt A human male being ridden by a horse to include all those other details and go straight for a darker, somber tone and expression and a dynamic of domination and submission rather then a more humorous description, unlikely.
    
    embedding-shape - 3 hours ago
    
    > although one may be embarrassed that racist material was created by their people
    Why? I don't see that. Are black people embarrassed if a black person commits a crime, yet not embarrassed if a white person commits a crime? That sounds very contrived to me and not at all how things work in reality.
    > If ones own race is being denigrated then one may indeed feel embarrassment
    I also don't understand this. Why would every white person feel any sort of embarrassment over images denigrating white people? Feel hate, anger or lots of other emotions, that'd make sense. But I still don't understand why "embarrassment" or shame is even on the table, embarrassment over what exactly? That there are racists?
    
    badhorseman - 2 hours ago
    
    Your posts this thread have been seemingly in bad faith and have taken rather blatant non-sequiturs made. The post by 'goga-piven' said that the pictures where embarrassing not actually one should feel shame and embarrassment. His meaning I believe is that the image is meant to embarrass a people and humiliate them or just portray them contemptibly that is to me clearly his meaning of 'embarrassing image'.
    My comment was to try and highlight this is the point of various racist depictions and that if one is powerless then indeed this can become an embarrassing shame. Maybe it's the case that you do not see it that way, but in any kind of bondage that a group of people are subject to, shame, embarrassment will follow along with many other feelings. I was not say a white person should be embarrassed and I don't think 'goga-piven' was. rather they could be manifestations of contempt or other hostile emotions on the authors part.
    >Why? I don't see that. Are black people embarrassed if a black person commits a crime, yet not embarrassed if a white person commits a crime? That sounds very contrived to me and not at all how things work in reality.
    I did not make a point about black people being embarrassed at black people committing a crime, I was more thinking the kind of collective guilt some German people speak of for Nazism, I made not prescriptive claims on the shame or embarrassment only that these are ways that people do behave.
    > I also don't understand this. Why would every white person feel any sort of embarrassment over images denigrating white people? Feel hate, anger or lots of other emotions, that'd make sense. But I still don't understand why "embarrassment" or shame is even on the table, embarrassment over what exactly? That there are racists?
    You have subtly changed your position hear to one where it's not an absurdity to feel an emotional response to an image that denigrates your people.
    of-course this was not the most pressing issue, the more important one would be the intent of the image. seemed to ignore that part entirely even though that is the main question. you made claims of missing context in other threads I made some preemptive counter arguments. Do tell me a more plausible context, if the one I provided is incorrect.
- wtcactus - 7 hours ago
  
  > Do western AI models mostly default to white people?
  No, they mostly default to black people even in historical contexts where they are completely out of place, actually. [1]
  "Google paused its AI image-generator after Gemini depicted America's founding fathers and Nazi soldiers as Black. The images went viral, embarrassing Google."
  [1] https://www.npr.org/2024/03/18/1239107313/google-races-to-fi...
  - viraptor - 6 hours ago
    
    > they mostly default to black people
    You're referring to a case of one version of one model. That's not "mostly" or "default to".
    
    raincole - 6 hours ago
    
    Out of curiosity I just tried this prompt:
    > Generate a photo of the founding fathers of a future, non-existing country. Five people in total.
    with Nano Banana Pro (the SOTA). I tried the same prompt 5 times and every time black people are the majority. So yeah, I think the parent comment is not that far off.
    
    viraptor - 6 hours ago
    
    Luck? 1 black person, 3 south Asian in total for me.
    But for an out of context imaginary future... why would you choose non-black people? There's about the same reason to go with any random look.
    
    wtcactus - 6 hours ago
    
    So, the answer to the question "Do western AI models mostly default to white people?" is clearly a resounding: no, they don't.
    
    viraptor - 6 hours ago
    
    No. But neither black people. Or anyone specifically. So we got to a nice balance it seems.
    
    KingMob - 6 hours ago
    
    I mean it's still far off, because they said "historical context", i.e., the actual past, but your prompt is about a hypothetical future.
    (I suspect you tried a prompt about the original founding fathers, and found it didn't make that mistake any more.)
    
    wtcactus - 6 hours ago
    
    [flagged]
    
    KingMob - 5 hours ago
    
    Yes, but you said "they mostly default to black people". Nice try at moving the goal posts.
    Anyway, you're tagged as "argued Musk salute wasn't nazi", so your ability to parse history is a little damaged.
    
    pizzafeelsright - an hour ago
    
    Where do we find this tagging?
    
    wtcactus - 3 hours ago
    
    Exactly where were my goal posts moved? I stated yet again "mostly default to black people". You are the one trying to move the goal posts because you just answered this message:
    "I just tried this prompt:
    > Generate a photo of the founding fathers of a future, non-existing country. Five people in total.
    I tried the same prompt 5 times and every time black people are the majority"
    Do you understand the concept of "mostly defaulting" to something, and how that is directly related to a group of "people [being] the majority" in the examples provided?
    > Anyway, you're tagged as "argued Musk salute wasn't nazi", so your ability to parse history is a little damaged.
    I don't really care what communists think since you aren't rational people. If you have any actual statement to make and for me to deconstruct again while pointing out your inability to follow through with basic logic or facts, please let me know.