ChatGPT Images 2.0

725 points by wahnfrieden 13 hours ago

System card: https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...

So during my Nano Banana Pro experiments I wrote a very fun prompt that tests the ability for these image generation models to follow heuristics, but still requires domain knowledge and/or use of the search tool:

    Create a 8x8 contiguous grid of the Pokémon whose National Pokédex numbers correspond to the first 64 prime numbers. Include a black border between the subimages.

    You MUST obey ALL the FOLLOWING rules for these subimages:
    - Add a label anchored to the top left corner of the subimage with the Pokémon's National Pokédex number.
      - NEVER include a `#` in the label
      - This text is left-justified, white color, and Menlo font typeface
      - The label fill color is black
    - If the Pokémon's National Pokédex number is 1 digit, display the Pokémon in a 8-bit style
    - If the Pokémon's National Pokédex number is 2 digits, display the Pokémon in a charcoal drawing style
    - If the Pokémon's National Pokédex number is 3 digits, display the Pokémon in a Ukiyo-e style

The NBP result is here, which got the numbers, corresponding Pokemon, and styles correct, with the main point of contention being that the style application is lazy and that the images may be plagiarized: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...

Running that same prompt through gpt-2-image high gave an...interesting contrast: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...

It did more inventive styles for the images that appear to be original, but:

- The style logic is by row, not raw numbers and are therefore wrong

- Several of the Pokemon are flat-out wrong

- Number font is wrong

- Bottom isn't square for some reason

Odd results.

dvt - 6 hours ago

This is an amazing test and it's kinda' funny how terrible gpt-2-image is. I'd take "plagiarized" images (e.g. Google search & copy-paste) any day over how awful the OpenAI result is. Doesn't even seem like they have a sanity checker/post-processing "did I follow the instructions correctly?" step, because the digit-style constraint violation should be easily caught. It's also expensive as shit to just get an image that's essentially unusable.
- the_arun - 4 hours ago
  
  This is from Gemini - https://lens.usercontent.google.com/banana?agsi=CmdnbG9iYWw6...
  - fblp - 3 hours ago
    
    Did it correctly follow the instructions? Don't know my pokemon well enough.
    
    minimaxir - 3 hours ago
    
    Essentially yes (bottom got distorted), but Gemini uses Nano Banana Pro or Nano Banana 2 so it's not a surprising result. The image I linked uses the raw API.
- anshumankmr - 5 hours ago
  
  that is interesting cause I feel gpt-image-1 did have that feature.
  (source: https://chatgpt.com/share/69e83569-b334-8320-9fbf-01404d18df...)
  - weird-eye-issue - 3 hours ago
    
    You are comparing ChatGPT to a raw image model. These are two completely different things. ChatGPT takes your input, modifies the prompt and then passes it to the image model and then will maybe read the image and provide output. The image model like through the API just takes the prompt verbatim and generates an image.
    
    minimaxir - 3 hours ago
    
    Nano Banana Pro and ChatGPT Images 2.0 also tweak the prompt because they can think.
    
    weird-eye-issue - 3 hours ago
    
    Yes exactly, "ChatGPT Images 2.0" is in ChatGPT. That is not a model.
- hyperadvanced - 5 hours ago
  
  I wouldn’t say it’s terrible. I wouldn’t say it’s a huge step forward in terms of quality compared to what I’ve seen before from AI
vincentbuilds - an hour ago

banana Pro gets the logic and punts on the art; gpt-2-image gets the art and punts on the logic. Feels like instruction-following and creativity sit on opposite ends of the same slider.
rrr_oh_man - 7 hours ago

Why would you consider this a good prompt?
- minimaxir - 7 hours ago
  
  Because both Nano Banana Pro and ChatGPT Images 2.0 have touted strong reasoning capabilities, and this particular prompt has more objective, easy-to-validate criteria as opposed to the subjective nature of images.
  I have more subjective prompts to test reasoning but they're your-mileage-may-vary (however, gpt-2-image has surprisingly been doing much better on more objective criteria in my test cases)
- o10449366 - 6 hours ago
  
  [flagged]
  - minimaxir - 6 hours ago
    
    "Quirky and obscure" has the functional benefit of ensuring the source question is not in the training data/outside the median user prompt, and therefore making the model less likely to cheat.
    We have enough people complaining about Simon Willison's pelican test.
    
    o10449366 - 36 minutes ago
    
    When you program, do you consider using your prior knowledge of programming cheating?
  - Bjartr - 5 hours ago
    
    What would make the prompt a better actual evaluation in your judgement?
    
    leptons - an hour ago
    
    Not focusing on pokemon for a start. Maybe use something more people can recognize and evaluate. I have zero knowledge of pokemon, I see it as a niche thing for ultra-nerdy people, and not something everyone is familiar with. Nothing about that test can be evaluated by anyone but a pokemon expert. Sorry, but pokemon isn't as mainstream as some people might think it is.
  - tailscaler2026 - 5 hours ago
    
    still #opentowork huh
    
    beepbooptheory - 3 hours ago
    
    Where does one even use that hashtag?
    
    minimaxir - 2 hours ago
    
    It's a LinkedIn joke.
  - codemog - 5 hours ago
    
    Ah yes, also known as C++ enjoyers.
Razengan - 3 hours ago

Even a few months ago, ChatGPT/Sora's image generation performed better than Gemini/Nano Banana for certain weird prompts:
Try things like: "A white capybara with black spots, on a tricycle, with 7 tentacles instead of legs, each tentacle is a different color of the rainbow" (paraphrased, not the literal exact prompt I used)
Gemini just globbed a whole mass of tentacles without any regards to the count
heroku - 3 hours ago

[dead]
m3kw9 - 2 hours ago

Prob a very unscientific way to test an image model. This would me likely because they have the reasoning turned down and let its instant output takeover
- minimaxir - 2 hours ago
  
  There's no good scientific way to test a closed-source model with both nondeterministic and subjective output.
  This example image was generated using the API on high, not the low reasoning version. (it is slow and takes 2 minutes lol)
- crustaceansoup - 2 hours ago
  
  If the results are quantifiable/objective and repeatable it's scientific, how is it not scientific?
  The reasoning amount is part of the evaluation isn't it?
- TeMPOraL - 40 minutes ago
  
  This is the best kind of science there is: direct, empirical test.

codebolt - 2 minutes ago

Anyone test it out for generating 2D art for games? Getting nano banana to generate consistent sprite sheets was seemingly impossible last time i tried a few months ago.

parasti - an hour ago

A great technical achievement, for sure, but this is kind of the moment where it enters uncanny valley to me. The promo reel on the website makes it feel like humans doing incredible things (background music intentionally evokes that emotion), but it's a slideshow of computer generatated images attempting to replicate the amazing things that humans do. It's just crazy to look at those images and have to consciously remind myself - nobody made this, this photographed place and people do not exist, no human participated in this photo, no human traced the lines of this comic, no human designer laid out the text in this image. This is a really clever amalgamation machine of human-based inputs. Uncanny valley.

simonw - 12 hours ago

I've been trying out the new model like this:

  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run https://tools.simonwillison.net/python/openai_image.py \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio"

Code here: https://github.com/simonw/tools/blob/main/python/openai_imag...

Here's what I got from that prompt. I do not think it included a raccoon holding a ham radio (though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure): https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...

simonw - 11 hours ago
I just got a much better version using this command instead, which uses the maximum image size according to https://github.com/openai/openai-cookbook/blob/main/examples...
```
  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
    --quality high --size 3840x2160
```
https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!
I think that image cost 40 cents.
- prmoustache - 20 minutes ago
  
  Funny how it can look convincing from far away but once you zoom in you find out most characters have a mix of leprosy and skin cancer.
- makira - 11 hours ago
  Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:
  "Found the raccoon holding a ham radio in waldo2.png (3840×2160).
  - Raccoon center: roughly (460, 1680) - Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585) - Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780 It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "
  Which is correct!
  - cwillu - 11 hours ago
    
    I had one problem: finding the raccoon. Now I have two: finding the red-and-white striped souvenir umbrella, and finding the raccoon.
    
    makira - 11 hours ago
    
    simonw posted 2 different images: make sure to look at the second one.
    
    cwillu - 11 hours ago
    
    Yeah, I noticed that just now, but too late to delete the comment :p
    
    jaggederest - 9 hours ago
    
    You had a meta problem, and three, in total: find the raccoon, find the umbrella, find the right link in the comments.
  - M3L0NM4N - 8 hours ago
    
    We would need a larger sample size than just myself, but the raccoon was in the very first spot I looked. Found it literally immediately, as if that's where my eyes naturally gravitated to first. Hopefully that's just luck and not an indictment of the image-creating ability, as if there is some element missing from this "Where's Waldo" image, that would normally make Waldo hard to find.
    
    nerdsniper - 6 hours ago
    
    There seemed to be more space around the raccoon than most other subjects. Zoomed out it appears as almost a “halo” highlighting the raccoon.
- wewtyflakes - 8 hours ago
  
  A startling number of people either have no arms, one arm, a half of an arm, or a shrunken arm; how odd!
  - rattlesnakedave - 5 hours ago
    
    To be fair, the average person has fewer than two arms.
    
    cozzyd - 2 hours ago
    
    Most people have an ARM in their pockets, nowadays. And possibly on their wrist.
    
    floodfx - 5 hours ago
    
    Haha. Underrated comment!
  - cozzyd - 5 hours ago
    
    This is why they're congregating around the first aid and the lost and found
  - globular-toast - 2 hours ago
    
    Finding the raccoon was instant. Finding all the weird AI artifacts is more fun. It's quite fascinating really. As usual it looks impressive at a glance but completely falls apart on closer inspection. I also didn't find any jokes, unless maybe the bridge to nowhere or finger posts pointing both ways counts?
- davebren - 11 hours ago
  
  The faces...that's nice that it turned a kid's book into an abomination
  - Filligree - 6 hours ago
    
    By image generation standards this is a ridiculously good result. No surprise that people instantly find the new limits, but they are new limits.
    
    globular-toast - an hour ago
    
    But it's also straight up plagiarism and still ridiculously bad on so many levels.
    
    davebren - 6 hours ago
    
    It could already copy the art styles from its training data, what is the advancement here?
  - vaulstein - 3 hours ago
    
    It's interesting that the raccoon is well defined because it was a part of the request. But none of the other Fauna are.
  - keithnz - 4 hours ago
    
    it's interesting, zoomed out it kind of looks ok, zoomed in.... oh my.
- jdironman - 4 hours ago
  
  The real NFTs where the images we generated along the way
- louiereederson - 11 hours ago
  
  The people in this image remind me of early this person does not exist, in the best way
  - dfee - 9 hours ago
    
    fair point, also "this raccoon does not exist"
- mirekrusin - 6 hours ago
  
  Can it generate non halloween version though?
  This lower-is-better danse macabre, nightmares inducing ratio feels like interesting proxy for models capability.
- gpt5 - 8 hours ago
  
  I tried it on the ChatGPT web UI and it also worked, although the ham radio looks like a handbag to me.
  https://postimg.cc/wyxgCgNY
  - luxpir - an hour ago
    
    Nice, enjoyed the image as someone who has been to the events. But also easy raccoon placement :)
  - djmips - 2 hours ago
    
    mmmm yummy OSLS?
- - 7 hours ago
  
  [deleted]
- ireadmevs - 11 hours ago
  
  I found it on the 2nd image! On the 1st one not yet...
- dzhiurgis - 3 hours ago
  
  Cost me < 1 cents - https://elsrc.com/elsrc/waldo/wojak.jpg
  And this medium quality, high resolution https://elsrc.com/elsrc/waldo/10_wojaks.jpg was 13cents
  p.s. aaaand that's soft launch my SaaS above, you can replace wojak.jpg with anything you want and it will paint that. It's basically appending to prompt defined by elsrc's dashboard. Hopefully a more sane way to manage genai content. Be gentle to my server, hn!
- Barbing - 4 hours ago
  
  >I think that image cost 40 cents.
  Kinda made me sad assuming the author didn't license anything to OpenAI.
  I recognize it could revert (99% of?) progress if all the labs moved to consent-based training sets exclusively, but I can't think of any other fair way.
  $.40 does not represent the appropriate value to me considering the desirability of the IP and its earning potential in print and elsewhere. If the world has to wait until it’s fair, what of value will be lost? (I suppose this is where the big wrinkle of foreign open weight models comes in.)
  - rafram - 3 hours ago
    
    License what? The concept of a hidden object search? The only stylistic similarity here is the viewing angle. Where’s Waldo comics are flat, brightly colored line drawings that look nothing like this at all.
    
    Barbing - 2 hours ago
    
    Well, I recognized the style from even the new physical books on sale today, but I don’t know art well enough to use a term like flat.
    I am not an art expert but I’m perhaps a reasonable consumer and there is possibility of confusion if someone sells AI Where’s Waldo knockoff books at the dollar store, maybe until I take a closer look.
  - - 3 hours ago
    
    [deleted]
makira - 12 hours ago

> though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure
I see an opportunity for a new AI test!
- vunderba - 11 hours ago
  
  There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer.
  It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.
- simonw - 12 hours ago
  
  I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it.
  - - 11 hours ago
    
    [deleted]
halamadrid - 3 hours ago

Really hard to look at these images given how not human like the humans are. A few are ok, but a lot are disfigured or missing parts and its hard to find a raccoon in here.
pants2 - 11 hours ago

The second 4K image definitely has a raccoon on the left there! Nice.
nerdsniper - 6 hours ago

That is a devilishly difficult prompt for current diffusion tasks. Kudos.
vova_hn2 - 6 hours ago

Thanks for the image, I will see their faces in my nightmares.
- vunderba - 6 hours ago
  
  This happens all too frequently when you ask a GenAI model to create an image with a large crowd especially a “Where’s Waldo?” style scenes, where by definition you’re going to be examining individual faces very closely.
- hackable_sand - 4 hours ago
  
  What about the faces of the people ChatGPT killed?
marricks - 7 hours ago

Like... this has things that AI will seemingly always be terrible at?
At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have:
- Nightmarish screaming faces on most people
- A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist
- A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity...
It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals...
We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos??
- p1esk - 6 hours ago
  
  AI will seemingly always be ...
  You do realize that the whole image generation field is barely 10 years old?
  I remember how I was able to generate mnist digits for the first time about 10 years ago - that seemed almost like magic!
ritzaco - 12 hours ago

haha took me a while to notice that one of the buildings is labelled 'Ham radio'
ElFitz - 11 hours ago

Damn. There’s a fun game app to make here ^^
- dymk - 7 hours ago
  
  Is there? The moment you look closely at the puzzle (which is... the whole point of Where's Waldo), you notice all the deformities and errors.
  - ElFitz - 26 minutes ago
    
    Yes, it’s not there yet. But nothing unsolvable. First thing that comes to mind would be generating smaller portion at the same resolution, then expand through tiling (although one might need to use another service & model for this), like we used to do with Stable Diffusion years ago.
    Another option would be generating these large images, splitting them into grids, and using inpainting on each "tile" to improve the details. Basically the reverse of the first one.
    Both significantly increase costs, but for the second one having what Images 2.0 can produce as an input could help significantly improve the overall coherence.
arealaccount - 12 hours ago

I see the raccoon
- 11 hours ago

[deleted]
- 11 hours ago

[deleted]
tptacek - 12 hours ago

5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top."
(I don't think it's right).
- ritzaco - 12 hours ago
  
  I tried
  > please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist
  and got this. I'm not sure I know what a ham radio looks like though.
  https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...
  - jackpirate - 12 hours ago
    
    Also, the racoon it circled isn't in the original.
    
    Aurornis - 11 hours ago
    
    I love how perfectly this captures the difficulties of using generative AI for detection tasks.
    
    jetbalsa - 6 hours ago
    
    Oh god yes, I've been trying to make a LLM Assisted Magic the Gathering card scanner... its been a hell of a time trying to get it to just OCR card names well....
    
    what - 4 hours ago
    
    Why would you use an LLM for OCR?
    
    angiolillo - 11 hours ago
    
    Indeed. I suppose one way to ensure you can find Waldo in any image is to add it yourself.
  - simonw - 10 hours ago
    
    That's excellent. I added it to my post: https://simonwillison.net/2026/Apr/21/gpt-image-2/#update-as...
  - davecahill - 5 hours ago
    
    hilarious - i tried and got the same thing.
    there was a very large bear in the first image; when asked to circle the raccoon it just turned the bear into a giant raccoon and circled it.

vunderba - 11 hours ago

OpenAI’s gpt-image-1.5 and Google’s NB2 have been pretty much neck and neck on my comparison site which focuses heavily on prompt adherence, with both hovering around a 70% success rate on the prompts for generative and editing capabilities. With the caveat being that Gemini has always had the edge in terms of visual fidelity.

That being said, gpt-image-1.5 was a big leap in visual quality for OpenAI and eliminated most of the classic issues of its predecessor, including things like the “piss filter.”

I’ll update this comment once I’ve finished running gpt-image-2 through both the generative and editing comparison charts on GenAI Showdown.

Since the advent of NB, I’ve had to ratchet up the difficulty of the prompts especially in the text-to-image section. The best models now score around 70%, successfully completing 11 out of 15 prompts.

For reference, here’s a comparison of ByteDance, Google, and OpenAI on editing performance:

https://genai-showdown.specr.net/image-editing?models=nbp3,s...

And here’s the same comparison for generative performance:

https://genai-showdown.specr.net/?models=s4,nbp3,g15

UPDATES:

gpt-image-2 has already managed to overcome one of the so‑called “model killers” on the test suite: the nine-pointed star.

Results are in for the generative (text to image) capabilities: Gpt-image-2 scored 12 out of 15 on the text-to-image benchmark, edging out the previous best models by a single point. It still fails on the following prompts:

- A photo of a brightly colored coral snake but with the bands of color red, blue, green, purple, and yellow repeated in that exact order.

- A twenty-sided die (D20) with the first twenty prime numbers (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71) on the faces.

- A flat earth-like planet which resembles a flat disc is overpopulated with people. The people are densely packed together such that they are spilling over the edges of the planet. Cheap "coastal" real estate property available.

All Models:

https://genai-showdown.specr.net

Just Gpt-Image-1.5, Gpt-Image-2, Nano-Banana 2, and Seedream 4.0

https://genai-showdown.specr.net?models=s4,nbp3,g15,g2

m_kos - 4 hours ago

Very useful website. Would you have insight into what models are best at editing existing images?
I often have to make very specific edits while keeping the rest of the image intact and haven't yet found a good model. These are typically abstract images for experiments.
I asked gpt-image-2 to recolor specific scales of your Seedream 4 snake and change the shape of others. It did very poorly.
- vunderba - 3 hours ago
  
  OpenAI actually has really good adherence, but occasionally tends to introduce its own almost equivalent of "tone mapping", making hyper-localized edits frustrating.
  I don’t know how much work it is for you, but one thing a lot of people do, myself included, is take the original image, make a change to it using something like NB, then paste that as the topmost layer in something like Krita/Pixelmator. After that, we’ll mask and feather in only the parts we actually want to change. It doesn’t always work if it changes the overall color balance or filters out certain hues, it can be a real pain but it does the job in some cases.
  The Flux models (like Kontext) are actually surprisingly good at making very minimal changes to the rest of the image, but unfortunately their understanding of complex prompts is much weaker than the closed, proprietary models.
  I will say that I’ve found Gemini 3.0 (NB Pro) does a relatively decent job of avoiding unnecessary changes - sometimes exceeding the more recent NB2, and it scored quite well on comparative image-editing benchmarks.
  https://genai-showdown.specr.net/image-editing
  - m_kos - 3 hours ago
    
    Thanks. I will try this! I need to read up on how to work with vision models for both generation and understanding.
VladVladikoff - 4 hours ago

Why does Gemini 3.1 get a pass for the same reasons they got image 2 gets a fail on the flat earth one? Gemini has all sorts of random body parts and limbs etc.
- vunderba - 4 hours ago
  
  That's a mistake~ None of the models successfully passed the Flat Earth composition test. I've updated the passing criteria to be more explicit as well. Thanks for catching that!
CamperBob2 - 5 hours ago

It'd be interesting if you could add HunyuanImage-3 to the competition. It's better than Z-Image at almost everything I've thrown at it.
It can be (slowly) run at home, but needs 96GB RTX 6000-level hardware so it is not very popular.
- vunderba - 4 hours ago
  
  I’ll have to give it another try. Its predecessor, Hunyuan Image 2.0, scored pretty poorly when I tested it last year: 2 out of 15, so it'll be interesting to see how much it has improved.
  Here's ZiT, Gpt-Image-2, and Hunyuan Image 2 for reference:
  https://genai-showdown.specr.net/?models=hy2,g2,zt
  Note: It won't show up in some of the newer image comparisons (Angelic Forge, Flat Earth, etc) because it's been deprecated for a while but in the tests where it was used (Yarrctic Circle, Not the Bees, etc.) it's pretty rough.
  - CamperBob2 - 4 hours ago
    
    It does quite a bit better than 2.0, I think. Or at least it may be stylistically different enough to justify a rematch against the others.
    Ring toss: https://i.imgur.com/Zs6UNKj.png (arguably a pass)
    9-pointed star: https://i.imgur.com/SpcSsSv.png (star is well-formed but only has 6 points)
    Mermaid: https://i.imgur.com/R6MbMPX.png (fail, and I can't get Imgur to host it for some reason even though it's SFW)
    Octopus: https://i.imgur.com/JTVH7xy.png (good try, almost a pass, but socks don't cover the ends of all the tentacles)
    Above are one-shot attempts with seed 42.
    
    vunderba - 3 hours ago
    
    > https://i.imgur.com/6NXpI2q.png
    You're killing me Smalls. This one is a 404. I'm really curious what it actually showed.
    That ring toss is definitely leagues better than its predecessor. I’m not going to fault it too much for the star though, that one is an absolute slate wiper. The only locally hostable model that ever managed it for me was the original Flux, and I’m still not entirely convinced it wasn’t a fluke. Despite getting twice as many attempts, Flux 2, a much larger model, couldn’t even pull it off.
    
    CamperBob2 - 3 hours ago
    
    Yeah, I suspect you'd see some solid passing scores if you ran it as many times as some of the others.
    For the mermaid, https://i.imgur.com/R6MbMPX.png sometimes seems to work but not consistently. It is probably triggering a porn filter of some kind. I need to find another free image host, as imgur has definitely jumped the shark.
    The image shows a mermaid of evident Asian extraction lying on a beach, face down. There is a dolphin lying on top of her, positioned at a 90-degree angle. It doesn't show any interaction at all, so a definite fail.
    
    vunderba - 2 hours ago
    
    I still use Imgur from time to time just because it’s convenient, but I’ve been meaning to build an Imgur-style extension for my site for a while, something that would let me drag and drop media for quick sharing but it being Astro-based (static site generation) makes it tricky.
what - 4 hours ago

Where can I see the actual prompts and follow ups you fed each model?
- vunderba - 4 hours ago
  
  So the prompts are tuned and adjusted on a per-model basis. If you look at the number of attempts, each receives a specific prompt variation depending on the model. This honestly isn't as much of an issue these days because SOTA models natural language parsing (particularly the multimodal ones) has eliminated a lot of the byzantine syntax requirements of the SD/SDXL days.
  The template prompt seen in each comparison gets adjusted through a guided LLM which has fine-tuned system prompts to rewrite prompts. The goal is to foster greater diversity while preserving intent, so the image model has a better chance of getting the image right.
  Getting to your suggestion for posting all the raw prompts, that's actually a great idea. Too bad I didn't think about it until you suggested it. And if you multiply it out - there's 15 distinct test cases against 22 models at this point, each with an average of about 8 attempts so we’re talking about thousands of prompts many of which are scattered across my hard drive. I might try to do this as a future follow-up.
  - what - 3 hours ago
    
    Shouldn’t every model get the same prompt? Seems a bit weird, especially when you can’t see the prompts that were used.
    
    vunderba - 2 hours ago
    
    The goal isn’t the prompt itself. The test is whether a prompt can be expressed in such a way that we still arrive at the author's intent, and of course to do so in a way that isn't unnatural.
    The prompts despite their variation are still expressed in natural language.
    The idea is that if you can rephrase the prompt and still get the desired outcome, then the model demonstrates a kind of understanding; however more variation attempts also get correspondingly penalized: this is treated more as a failure of steering, not of raw capability.
    An example might help - take the Alexander the Great on a Hippity-Hop test case.
    The starter prompt is this: "A historical oil painting of Alexander the Great riding a hippity-hop toy into battle."
    If a model fails this a couple of times (multiple seeds), we might use a synonym for a hippity-hop, it was also known as a space hopper.
    Still failing? We might try to describe the basic physical appearance of a hippity-hop.
    Thus, something like GPT-Image-2 scored much higher on the compliance component of the test, requiring only a single attempt, compared with Z-Image Turbo, which required 14 attempts.

ea016 - 12 hours ago

Price comparison:

GPT Image 2

  Low     : 1024×1024 $0.006 | 1024×1536 $0.005 | 1536×1024 $0.005

  Medium  : 1024×1024 $0.053 | 1024×1536 $0.041 | 1536×1024 $0.041

  High    : 1024×1024 $0.211 | 1024×1536 $0.165 | 1536×1024 $0.165

GPT Image 1

  Low     : 1024×1024 $0.011 | 1024×1536 $0.016 | 1536×1024 $0.016

  Medium  : 1024×1024 $0.042 | 1024×1536 $0.063 | 1536×1024 $0.063

  High    : 1024×1024 $0.167 | 1024×1536 $0.25  | 1536×1024 $0.25

Melatonic - 11 hours ago

Weird that they restrict the resolution so much. Does it fall apart with more detail (when zoomed in) or does the cost just skyrocket?
- dsrtslnd23 - 5 minutes ago
  
  actually gpt-image-2 is VERY flexible with the resolution. You can use arbitrary resolution within the max pixel budget.
- vunderba - 11 hours ago
  
  It's usually based on what they've been trained on. There aren't very many models that'll do higher resolutions outside of Seedream but adherency is worse.
  - _the_inflator - 9 hours ago
    
    Processing power, not training. The larger the scene in 2ď the more you need to compute. The resolution itself is not flexible. Imagine painting a white canvas. It is still a pixel per pixel algo which costs LLM GPU power while being the easiest thing to do without it.
    You can create larger images by creating separate parts you recombine. But they may not perfectly match their borders.
    It is a Landau thing not a trading thing. The idea of LLM is to work on the unknown.
    
    vunderba - 8 hours ago
    
    It depends on the model. Diffusion models, which are among the more popular approaches, are typically trained at a specific image resolution.
    For example, SDXL was trained on 1MP images, which is why if you try to generate images much larger than 1024×1024 without using techniques like high-res fixes or image-to-image on specific regions, you quickly end up with Cthulhu nightmare fuel.
  - nomel - 9 hours ago
    
    Need a model trained on closeup/macro shots of everything, to use for upscaling, then run that, as a kernel, over the whole image.
- ModernMech - 3 hours ago
  
  Generate a lower resolution image and upscale to the resolution you need.
- - 10 hours ago
  
  [deleted]
- al_borland - 9 hours ago
  
  [dead]
ComputerGuru - 4 hours ago

It can generate 3840x2160
lxgr - 9 hours ago

Interesting, I wonder why larger outputs are more expensive than smaller square ones on v2, while it’s the other way around in v1.

neom - 8 hours ago

Here is my regular "hard prompt" I use for testing image gen models:

"A macro close-up photograph of an old watchmaker's hands carefully replacing a tiny gear inside a vintage pocket watch. The watch mechanism is partially submerged in a shallow dish of clear water, causing visible refraction and light caustics across the brass gears. A single drop of water is falling from a pair of steel tweezers, captured mid-splash on the water's surface. Reflect the watchmaker's face, slightly distorted, in the curved glass of the watch face. Sharp focus throughout, natural window lighting from the left, shot on 100mm macro lens."

google drive with the 2 images: https://drive.google.com/drive/folders/1-QAftXiGMnnkLJ2Je-ZH...

Ran a bunch both on the .com and via the api, none of them are nearly as good as Nano Banana.

(My file share host used to be so good and now it's SO BAD, I've re-hosted with them for now I'll update to google drive link shortly)

jcattle - 21 minutes ago

I mean, your prompt is basically this skit: https://www.youtube.com/watch?v=BKorP55Aqvg ("The Expert" 7 red lines: all strictly perpendicular, some with green ink some with transparent ink)
I couldn't imagine the image you were describing. I've listed some of the red lines with green ink I've noticed in your prompt:
Macro Close Up - Sharp throughout
Focus on tiny gear - But also on tweezers, old watchmakers hand, water drop?
Work on the mechanism of the watch (on the back of the watch) - but show the curved glass of the watch face which is on the front
This is the biggest. Even if the mechanism is accessible from the front, you'd have to remove the glass to get to it. It just doesn't make sense and that reflects in the images you get generated. There's all the elements, but they will never make sense because the prompt doesn't make sense.
rrr_oh_man - 7 hours ago

Why would you consider this a good prompt?
- brynnbee - 5 hours ago
  
  My observations have been that image generation is especially challenged when asked to do things that are unusual. The fewer instances of something happening it has to train on, the worse it tends to be. Watch repair done in water fits that well - is there a single image on the internet of someone repairing a watch that is partially submerged in water? It also tends to be bad at reflections and consistency of two objects that should be the same.
  - - 22 minutes ago
    
    [deleted]
- - 6 hours ago
  
  [deleted]
the_lucifer - 7 hours ago

Looks like your image host has rate limited viewing the shared images, wanted to give you a heads up
- neom - 7 hours ago
  
  Thanks, I need to get off Zight, they used to be such an nice option for fast file share but they've really suffered some of the worst enshittification I've seen yet.
pb7 - 7 hours ago

Links are broken.
- waynesonfire - 7 hours ago
  
  So.. sign up. "Get Sight for free". Ads everywhere bro.

schneehertz - 6 hours ago

Generating a 4096x4096 image with gemini-3.1-flash-image-preview consumes 2,520 tokens, which is equivalent to $0.151 per image.

Generating a 3840x2160 image with gpt-image-2 consumes 13,342 tokens, which is equivalent to $0.4 per image.

This model is more than twice as expensive as Gemini.

strangescript - 6 hours ago

this is apples to oranges, the flash version version a full version
this thing is like 5x better than flash at fine grain detail
- ac29 - 6 hours ago
  
  Google's naming might be misleading, currently 3.1 flash image outperforms the available pro version (3.0 pro) on most benchmarks: https://deepmind.google/models/model-cards/gemini-3-1-flash-...
- - 6 hours ago
  
  [deleted]
- altcognito - 6 hours ago
  
  .40 cents for high quality output is insanely cheap
  it is only going to get cheaper
  - eclipticplane - 6 hours ago
    
    > .40 cents
    Warning: Verizon math ahead.
    
    tfehring - 5 hours ago
    
    In case anyone is unfamiliar with one of the most infuriating phone calls of all time: https://www.youtube.com/watch?v=MShv_74FNWU

madrox - 8 hours ago

This seems like a great time to mention C2PA, a specification for positively affirming image sources. OpenAI participates in this, and if I load an image I had AI generate in a C2PA Viewer it shows ChatGPT as the source.

Bad actors can strip sources out so it's a normal image (that's why it's positive affirmation), but eventually we should start flagging images with no source attribution as dangerous the way we flag non-https.

Learn more at https://c2pa.org

debazel - 5 hours ago

> but eventually we should start flagging images with no source attribution as dangerous the way we flag non-https.
Yes, lets make all images proprietary and locked behind big tech signatures. No more open source image editors or open hardware.
- henry-j - 4 hours ago
  
  C2PA is actually an open protocol, à la SMTP. the whole spec is at https://spec.c2pa.org/, available for anyone to implement.
woadwarrior01 - 8 hours ago

Yeah, OpenAI has been attaching C2PA manifests to all their generated images from the very beginning. Also, based on a small evaluation that I ran, modern ML based AI generated image detectors like OmniAID[1] seem to do quite well at detecting GPT-Image-2 generated images. I use both in an on-device AI generated image detector that I built.
[1]: https://arxiv.org/abs/2511.08423
mdasen - 7 hours ago

> Bad actors can strip sources out
I think the issue is that it's not just bad actors. It's every social platform that strips out metadata. If I post an image on Instagram, Facebook, or anywhere else, they're going to strip the metadata for my privacy. Sometimes the exif data has geo coordinates. Other times it's less private data like the file name, file create/access/modification times, and the kind of device it was taken on (like iPhone 16 Pro Max).
Usually, they strip out everything and that's likely to include C2PA unless they start whitelisting that to be kept or even using it to flag images on their site as AI.
But for now, it's not just bad actors stripping out metadata. It's most sites that images are posted on.
- henry-j - 4 hours ago
  
  There’s actually a part of the NY state budget right now (TEDE part X, for my law nerds) that’d require social media companies to preserve non-PII provenance metadata and surface it to the user, if the uploaded image has it.
  linkedin already does this--- see https://www.linkedin.com/help/linkedin/answer/a6282984, and X’s “made with ai” feature preserves the metadata but doesn’t fully surface it (https://www.theverge.com/ai-artificial-intelligence/882974/x...)
- madrox - 7 hours ago
  
  You're implying social platforms aren't bad actors ;)
  In seriousness, social platforms attributing images properly is a whole frontier we haven't even begun to explore, but we need to get there.
paradoxyl - 42 minutes ago

What a dystopian, pro-tyranny ask. Horrifying.

skybrian - 8 hours ago

This time it passed the piano keyboard test:

https://chatgpt.com/s/m_69e7ffafbb048191b96f2c93758e3e40

But it screwed up when attempting to label middle C:

https://chatgpt.com/s/m_69e8008ef62c8191993932efc8979e1e

Edit: it did fix it when asked.

vunderba - 8 hours ago

When NB 2 came out I actually had to increase the difficulty of the piano test - reversing the colors of all the accidentals and the naturals, and it still managed it perfectly.
https://mordenstar.com/other/nb-pro-2-tests

swalsh - 9 hours ago

Been using the model for a few hours now. I'm actually reall impressed with it. This is the first time i've found value in an image model for stuff I actually do. I've been using it to build powerpoint slides, and mockups. It's CRAZY good at that.

johnwheeler - 6 hours ago

Yeah, it's funny. I would expect to see more enthusiasm versus just basic run-of-the-mill, "oh, there it is". Leave it to the HN crowd. This is incredible. I don't even like OpenAI.

justani - 2 hours ago

I have a few cases where nano banana fails all the time, even gpt image 2 is failing.

A 3 * 3 cube made out of small cubes, with a small 2 * 2 cube removed from it - https://chatgpt.com/share/69e85df6-5840-83e8-b0e9-3701e92332...

Create a dot grid containing a rectangle covering 4 dots horizontally and 3 dots vertically - https://chatgpt.com/share/69e85e4b-252c-83e8-b25f-416984cf30...

One where Nano banana fails but gpt image 2 worked: create a grid from 1 to 100 and in that grid put a snake, with it's head at 75 and tail at 31 - https://chatgpt.com/share/69e85e8b-2a1c-83e8-a857-d4226ba976...

teruakohatu - an hour ago

> A 3 * 3 cube made out of small cubes, with a small 2 * 2 cube removed from it - https://chatgpt.com/share/69e85df6-5840-83e8-b0e9-3701e92332...
It is a little ambiguous (what exactly is a "3x3 cube") but I tried a bunch of variations and I simply could not get any Gemini models to produce the right output.

porphyra - 9 hours ago

The improvement in Chinese text rendering is remarkable and impressive! I still found some typos in the Chinese sample pic about Wuxi though. For example the 笼 in 小笼包 was written incorrectly. And the "极小中文也清晰可读" section contains even more typos although it's still legible. Still, truly amazing progress. Vastly better than any previous image generation model by a large margin.

Lucasoato - 7 hours ago

Is this even better than Chinese models? I suppose they focus much more on that aspect, simply because their training data might include many more examples of Chinese text.

amunozo - 11 hours ago

This is not as exciting as previous models were, but it is incredibly good. I am starting to think that expressing thoughts in words clearly is probably the most important and general skill of the future.

aulin - 2 hours ago

Well that was probably the most important general skill even before this.
- 9 hours ago

[deleted]
echelon - 9 hours ago

> I am starting to think that expressing thoughts in words clearly is probably the most important and general skill of the future.
Without question.
AI will be indistinguishable from having a team. Communicating clearly has always and will always mattered.
This, however, is even stronger. Because you can program and use logic in your communications.
We're going to collectively develop absolutely wild command over instruction as a society. That's the skill to have.
- adamhartenz - 6 hours ago
  
  How can AI be the amazing thing you say it is, but also too stupid to understand unless you get really good at communicating. Wouldn't better AI just mean it understands your ramblings better?
  - jstanley - 17 minutes ago
    
    It can't grab out information that isn't there. If your ramblings are ambiguous then it has to make a guess.
  - pickleRick243 - 5 hours ago
    
    It's fine if the "rambling" is logically coherent. So the communication ability isn't really about expressing your thoughts eloquently, but just effectively and clearly. Run on sentences and train of thought is fine as long as you are saying something meaningful. But no AI will be able to read your mind and know exactly what you mean by "make really cool looking website, not lame please, also nice colors, not boring". Declarative programming through natural language will become incredibly powerful.
  - raincole - 5 hours ago
    
    Many humans are great at their expertise but bad at communicating. How?
- yreg - 8 hours ago
  
  On the other hand LLMs are getting very good at understanding poorly constructed instructions as well.
  So being able to express oneself clearly in a structured way may not be such an edge.
  - amunozo - 2 hours ago
    
    Yes, I agree, but as one of the other comments say, they are not able to read your mind. So even if the structure and style is not clear, you must be able to express what you want.

6thbit - 12 hours ago

System card link with safety details https://deploymentsafety.openai.com/chatgpt-images-2-0

direct pdf https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...

dang - 7 hours ago

Link added to toptext. Thanks!

louiereederson - 12 hours ago

The image of the messy desktop with the ASCII art is so impressive - the text renders, the date is consistent, it actually generated ASCII art in "ChatGPT", etc. I was skeptical that it was cherry-picked but was able to generate something very similar and then edit particular parts on the desktop (i.e. fixing content in the browser window and making the ASCII dog "more dog like"). It's honestly astounding, to me at least.

baalimago - 39 minutes ago

"Benchmarks" aside, do anyone actually use these image models for anything?

medlazik - 16 minutes ago

Look around? It's everywhere. Try talking to a graphic designer looking for a job theses days. Companies didn't wait for these tools to be good to start using them.
croisillon - 30 minutes ago

MAGA to show how terrible Europe is ;)

dktp - 11 hours ago

One interesting thing I found comparing OpenAI and Gemini image editing is - Gemini rejects anything involving a well known person. Anything. OpenAI is happy to edit and change every time I tried

I have a sideproject where I want to display standup comedies. I thought I could edit standup comedy posters with some AI to fit my design. Gemini straight up refuses to change any image of any standup comedy poster involving a well know human. OpenAI does not care and is happy to edit away

Melatonic - 11 hours ago

How does it determine they are well known and not just similar looking?
- yreg - 8 hours ago
  
  Gemini often rejects photos of random people (even ones it generated itself) because it thinks they look too similar to some well known person.
- dktp - 11 hours ago
  
  I don't know tbh. I've tried it on 10-20 various level of famous standups and Gemini refuses every time
  Just for testing, I just tried this https://i.ytimg.com/vi/_KJdP4FLGTo/sddefault.jpg ("Redesign this image in a brutalist graphic design style"). Gemini refuses (api as well as UI), OpenAI does it
  - arjie - 10 hours ago
    
    It's not super deterministic but it didn't fail once on my attempts. See: https://imgur.com/a/james-acaster-cold-lasagne-1R7fpzQ
    
    dktp - 10 hours ago
    
    Very interesting. It fails every single time for me. I'm in Germany, maybe Google is stricter here?
    See https://imgur.com/a/77BRDQv
    
    arjie - 10 hours ago
    
    That makes sense to me. I just Googled around like a fool and got here https://en.wikipedia.org/wiki/Personality_rights#Germany
    It seems like they're trying to follow local law. What a nightmare to have to manage all jurisdictions around such a product. Surprised it didn't kill image generation entirely.
    
    jliptzin - 8 hours ago
    
    Yea, especially when they know all that work will be completely pointless in a few years when open source / local models will be just as good and won't have any legal limitations, so people will be generating fake images of famous people like crazy with nothing stopping them
  - Melatonic - 10 hours ago
    
    What if you change the prompt to tell it specifically its not a famous person? Or try it without text?
- BoorishBears - 5 hours ago
  
  There are models specifically for detecting well known people https://docs.aws.amazon.com/rekognition/latest/dg/celebritie...
vunderba - 8 hours ago

Are you using Google Gemini directly? I've found the Vertex API seems to be significantly less strict.

____tom____ - 11 hours ago

No mention of modifying existing images, which is more important than anything they mentioned.

I think we all know the feeling of getting an image that is ok, but needs a few modifications, and being absolutely unable to get the changes made.

It either keeps coming up with the same image, or gives you a completely new take on the image with fresh problems.

Anyone know if modification of existing images is any better?

Anything better that OpenAI?

frmersdog - 7 hours ago

Image editing program -> different versions of the image, each with some but not all of the elements you want, on each layer -> mask out the parts you don't need/apply mask, fill with black, soft brush with white the parts you want back in. Copy flattened/merged, drop it back into the image model, keep asking for the changes. As long as each generation adds in an element you want, you can build a collage of your final image.
user34283 - 20 minutes ago

It's the first thing I tried, because Nano Banana 2 deteriorates the output with each turn, becoming unusable with just a few edits.
ChatGPT Images 2.0 made it unusable at the first turn. At least in the ChatGPT app editing a reference image absolutely destroyed the image quality. It perfectly extracted an illustration from the background, but in the process basically turned it from a crisp digital illustration into a blurry, low quality mess.
tomjen3 - 11 hours ago

There was an Edit button in one of the images in the livestream

jcattle - 11 minutes ago

Can we talk about how jarring the announcement video is?

AI generated voice over, likely AI generated script (You see, this model isn't just generating images, it's thinking!). From what it looks like only the editing has some human touch to it?

It does this Apple style announcement which everyone is doing, but through the use of AI, at least for me, it falls right into the uncanny valley.

sanex - 5 hours ago

Having the launch website just scrollable generated images is so slick. I love this.

gverrilla - 2 hours ago

You can click the images too, to see the prompt that got them gen'ed.

throwaway2027 - 12 hours ago

I know people like to dunk on ChatGPT and Gemini and say Claude is or used to be better, but you can still use worse models when you're out of usage AND make use of Nano Banana and and ChatGPT Image generation with separate limits for your subscription. I think it could make it a more package as a whole for some people (non-programmers). I do like having the option and am excited for which improvements they've done to ChatGPT Image generation because in the past it had this yellow piss filter and 1.5 it sort of fixed it but made things really generic with Nano Banana beating it (altough Gemini also had a too aggressively tuned racial bias which they fixed), it seems the images ChatGPT generates have gotten better.

SV_BubbleTime - 8 hours ago

I still see that piss filter on their samples. It isn’t as bad, but someone there really loves it.
- - 8 hours ago
  
  [deleted]

overgard - 8 hours ago

Pretty mixed feelings on this. From the page at least, the images are very good. I'd find it hard to know that they're AI. Which I think is a problem. If we had a functioning congress, I wonder if we might end up with legislation that these things need to be watermarked or otherwise made identifiable as AI generated..

I also don't like that these things are trained on specific artist's styles without really crediting those artists (or even getting their consent). I think there's a big difference between an individual artist learning from a style or paying it homage, vs a machine just consuming it so it can create endless art in that style.

kansface - 7 hours ago

> If we had a functioning congress, I wonder if we might end up with legislation that these things need to be watermarked or otherwise made identifiable as AI generated..
Not a lawyer, but that reads as compelled speech to me. Materially misrepresenting an image would be libel, today, right?
- overgard - 5 hours ago
  
  Well, considering that AI generated content can't be copyrighted (afaik at least), I think we're in very different legal territory when it comes to AI creating things. While it's true that deepfakes could be considered libel.. good luck prosecuting that if you can't even figure out where the image came from.
  The problem is it's all too easy to generate - you can't really do much about an individual piece of slop because there's so much of it. I think we need a way to filter this stuff, societally.
bryanhogan - 7 hours ago

Trying to watermark or otherwise label them as AI generated is a lost fight, we should assume every image and video we see online may be AI generated.
- rootusrootus - 6 hours ago
  
  This helps the segment of society that is interested in applying critical thinking to what they see. I am not sure that is anything like a majority or even a significant plurality. It seems like just about every image or video gets accused of being AI these days, but predictably the accusations depend on the ideology of the accuser.
apsurd - 7 hours ago

You might be onto something. I find every image unsettling. they're very good no doubt, but maybe it disturbs me because all of it is a complete copy of what someone else created. I know, I know, there is no pure invention. That's not what i mean. Humans borrow from other humans all the time. There's a humanity in that! A machine fully repurposing a human contribution as some kind of new creation, iono i'm old, it's weird and i don't like it.
Maybe i'm just bloviating also.

joegibbs - 10 hours ago

The quality of the text is really impressive and I can’t seem to see any artefacts at all. The fake desktop is particularly good: Nano Banana would definitely slip up with at least a few bits of the background.

daemonologist - 4 hours ago

There are a couple of AI-esque misspellings - in the More Myth than Menace wolves image, on the right in the "at a glance" section, it reads "wolves aarely approach people," and in the Typography image the text in the top right is "Type connncts us all."
But yeah the quality is remarkable, and rather scary.
wek - 5 hours ago

I use Nano Banana all the time and this seems like a step up

bensyverson - 12 hours ago

I caught the last minute of this—was it just ChatGPT Images 2.0?

punty - 12 hours ago

It appears so!
minimaxir - 12 hours ago

yes

nickandbro - 8 hours ago

200+ points in Arena.ai , that's incredible. They are cleaning house with this model

moralestapia - 8 hours ago

point delta (from 2nd) not total
- nickandbro - 8 hours ago
  
  https://www.youtube.com/watch?v=Adsaiyr7Nv8

squidsoup - 8 hours ago

Are camera manufacturers working on signed images? That seems like the only way our trust in any digital media doesn't collapse entirely.

- 5 hours ago

[deleted]
randyrand - 5 hours ago

Signed images don’t get you much. You can just hardwire the image sensor to a computer and sign raw pixels.
- Barbing - an hour ago
  
  Is the situation brighter for a company who owns the hardware and the software, for Apple?
  Taking a picture of an AI generated image aside, theoretically could Apple attest to origin of photos taken in the native camera app and uploaded to iCloud?
  Fascinating, by the way, thank you!
Nition - 5 hours ago

Ultimately even with that tech, you can still take a photo of an AI generated scene. Maybe coupled with geolocation data in the signature or something it might work.
- Barbing - an hour ago
  
  Any thoughts on attempted multiple camera/360 camera solutions? Can make it cost prohibitive to generate exceptional fakes… for a little while
  Kind of like showing the proctor around your room with your webcam before starting the exam.
  —
  I think legacy media stands a chance at coming back as long as they maintain a reputation of deeply verifying images, not being fooled.
- petesergeant - 3 hours ago
  
  I see signing chains as the way to go here. Your camera signs an image, you sign the signed image, your client or editor signs the image you signed etc etc. Might finally have a use for blockchain.

naseemali925 - 2 hours ago

Its amazingly good at creating UI mockups. Been trying this to create UI mockups for ideas.

hahahacorn - 11 hours ago

One of the images in the blog (https://images.ctfassets.net/kftzwdyauwt9/4d5dizAOajLfAXkGZ7...) is a carbon copy of an image from an article posted Mar 27, 2026 with credits given to an individual: https://www.cornellsun.com/article/2026/03/cornell-accepts-5...

Was this an oversight? Or did their new image generation model generate an image that was essentially a copy of an existing image?

arjie - 11 hours ago
That has to be the wrong stock image included or something, bloody hell.
```
     magick image-l.webp image-r.jpg -compose difference -composite -auto-level -threshold 30% diff.png
```
It's practically all dark except for a few spots. It's the same image just different size compression whatever. I can't find it in any stock image search, though. Surely it could not have memorized the whole image at that fidelity. Maybe I just didn't search well enough.
- Melatonic - 10 hours ago
  
  Or the image was generated with AI in the first place and a test for Images 2.0
  - IsTom - 9 hours ago
    
    Well, it's on web archive. So unless they got their hands on it almost a month early or escaped their light cone it wasn't.
  - arjie - 10 hours ago
    
    Haha! That would really take the cake. If it is, congratulations to them! I could never have known.
recitedropper - 11 hours ago

This is hilarious. Seems like kind of a random image for a model to memorize, but it could be.
There is definitely enough empirical validation that shows image models retain lots of original copies in their weights, despite how much AI boosters think otherwise. That said, it is often images that end up in the training set many times, and I would think it strange for this image to do that.
Regardless, great find.
- Nition - 5 hours ago
  
  I feel it's too much of a perfect match to be generated from the model's memory. It's pixel perfect. Gotta be a mistake.
minimaxir - 11 hours ago

Given the recency of that image, it is unlikely it is in the training data and therefore I would go with oversight.
- ajam1507 - 5 hours ago
  
  The image is likely older than the article given this picture from over a year ago.
  https://www.instagram.com/p/DGQ01bzTwyo/
  - afro88 - 8 minutes ago
    
    That's not the same picture

thelucent - 11 hours ago

It seems to still have this gpt image color that you can just feel. The slight sepia and softness.

honzaik - 10 hours ago

I was just wondering about that. Did they embrace it as a “signature look”? it cant be accidental, right?
- GaryBluto - 8 hours ago
  
  It's definitely not accidental but I'm not completely sure whether or not it is simply a "tell" or watermark or an attempt to foster brand association.
- dymk - 7 hours ago
  
  It's the Stranger Things nostalgia filter. Almost all the sample pictures they had looked like they were vaguely from the 90s-00s era.

Oras - 9 hours ago

My test for image models is asking it to create an image showing chess openings. Both this model and Banana pro are so bad at it.

While the image looks nice, the actual details are always wrong, such as showing pawns in wrong locations, missing pawns, .. etc.

Try it yourself with this prompt: Create a poster to show opening game for Queen's Gambit to teach kids to play chess.

lxgr - 8 hours ago

It almost nailed it for me (two squares have both white and black color). All pieces and the position look correct.
tempaccount5050 - 8 hours ago

What move? Who's turn is it? Declined or accepted? Garbage in, garbage out.
- bogtap82 - 8 hours ago
  
  In some cases I would agree with this, but image model releases including this one are beginning to incorporate and market the thinking step. It is not a reach at this point to expect the model to take liberties in order to deliver a faithful and accurate representation of your request. A model could still be accurate while navigating your lack of specificity.
- timacles - 7 hours ago
  
  Kasparov vs Karpov ‘87 Olympiad. Move 6
- dudul - 8 hours ago
  
  What do you mean? Parent clearly describes the Queen's Gambit. 1.d4 d5 2.c4 There is no room for ambiguity here.
  - kuboble - 5 hours ago
    
    King Indian Defense would be a better prompt as Queen's Gambit can now refer to e.g. some scene from Netflix series.

kibibu - 10 hours ago

Genuine question: what positive use cases are sufficient to accept the harm from image generators?

One that i can think of:

- replacing photography of people who may be unable to consent or for whom it may be traumatic to revisit photographs and suitable models may not be available, e.g. dementia patients, babies, examples of medical conditions.

Most other vaguely positive use cases boil down to "look what image generators can do", with very little "here's how image generators are necessary for society.

On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.

bulletsvshumans - 8 hours ago

Democratizing visual communication is arguably useful, for instance helping people to create diagrams that illustrate a concept they wish to convey. This is contingent on the tech working sufficiently well that the visuals are more effective at communication than the text that went into producing them though.
- tdb7893 - 7 hours ago
  
  It's always felt like way overhyping to call something "democratization" when it's something I could do as a middle schooler in 2005. It takes some skill to do very well but it's not like basic diagram creation isn't something people already could do for basically free (I create figures for my job all the time now and chatGPT is more expensive than tools I use for design).
  Commissioning high quality diagrams from a designer is expensive and I guess it's much cheaper now to essentially commission something but idk, "democratization" still feels weird for just undercutting humans on price.
  - NewsaHackO - 6 hours ago
    
    You are making a mistake a lot of people make when talking about genAI helping others do work. I get that to you it is very easy to do, but there are other groups of people that are not able to do it. What you are saying is like a hobbyist carpenter saying that making a bedside table would take him one weekend to do, so he doesn't think it is okay for tables to be made via assembly line instead of hiring a carpenter to do it.
    
    tdb7893 - 4 hours ago
    
    I think you're missing my point, which is pretty narrow here. "Democratization" is fairly grand term implying that the general public now have access to something freeing they didn't before (it generally invokes some idea of liberation, as the term often is used to note a transition from an authoritarian to a democratic government). I don't think there has ever been a particularly high barrier to making good diagrams, in my experience it's an easy to learn skill both in time and money, so it feels like it's cheapening the term "democratization". Maybe I'm being a bit sensitive though because of how the world is right now with people sometimes literally fighting for democracy. Normally I am pretty lax with semantics but I've had some people really rub me the wrong way when overhyping AI.
  - pesus - 7 hours ago
    
    Yeah, it's not "democratization", people were just too lazy to do it before. It only takes some basic effort and a little bit of time to be able to create decent versions of those things.
- lossyalgo - 7 hours ago
  
  My workplace does this for EVERYTHING. And they are always immediately obviously AI slop, both because we all know they wouldn't ever pay an actual artist to create graphics, but also because the people creating the graphics have no sense of style and let it generate the most generic shit possible with zero creativity.
  It's definitely not helpful. It's just annoying and disgusting and a waste of resources IMO. But hey at least Powerpoint presentations have AI slop instead of stuff taken from Google Images!?
- galleywest200 - 7 hours ago
  
  Can these people not just create a diagram with their own hands? Literally a pencil and paper.
  I am at the point where I would prefer a poorly human drawn diagram with terrible handwriting over AI slop.
  - twobitshifter - 6 hours ago
    
    If you scroll far enough down the linked page, you’ll see they’re knocking off poor handwriting too!
  - zbrozek - 7 hours ago
    
    I do that. My slide decks these days are hand scribbled.
  - rafael-lua - 7 hours ago
    
    It is not the making of the diagram that is the problem, but often the fact I have no idea how to put it visually. AI is awesome at this.
    Now, does that justify the harm? Not for me, but this issue is way out of my league.
    
    davebren - 7 hours ago
    
    The point of a diagram is that you have something in your head to turn into the diagram. There's no point if you can't do it yourself and the image generator is coming up with it for you.
    
    rafael-lua - 6 hours ago
    
    I disagree. Diagrams are a type of visual communication, and not everyone is good at translating things to visual. I open an excalidraw with clear concepts in my head, but nothing comes out of it. I try C4 or flow diagrams, and I spend an excessive amount of time refactoring them to end up mediocre anyway. Not just me, I know MANY developers that are amazing at explaining things but are mind-blocked when drawing simple circles and arrows.
    Helping us navigate things we aren't good at has been one of the main selling points of AI.
    
    breezybottom - 3 hours ago
    
    It's not translation if it's completely AI generated to begin with. Instead of addressing your mental deficits (which sound severe), you're offloading it and making the problem worse.
    
    davebren - 6 hours ago
    
    Learn how to draw simple circles and arrows, this is the epitome of learned helplessness.
- kibibu - 4 hours ago
  
  I'm not convinced that "arguably useful" is sufficient to offset its much more heavily-used application as a casually-available disinformation engine.
  I mean, the cat's out of the bag; but the cat stinks.
chromacity - 9 hours ago

How else do you expect me to illustrate my LLM-generated blog posts about AI?
- 2ndorderthought - 8 hours ago
  
  Oh my. You still make those? Ever since model chupacobra 2.46 we have AI agents making those for us. At one point I was on the fence about totally outsourcing it to agents but it's way more efficient. Now I have 50 posts a day under different names.
spijdar - 9 hours ago

The same question could be poised of art in general. I know that response would (and probably should) ruffle peoples' figurative feathers, but I think it's worth considering. A lot of art isn't "necessary for society".
The question still stands, "are the benefits worth the cost to society", but it bears remembering we do a lot of things for fun which aren't "necessary for society".
- TomGarden - 9 hours ago
  
  I used to think like what you describe, but I've fallen on the side of "art is just more emotionally resonant human communication". And most of the time human communication with more effort and thought behind it. AI art falls short on both being human and, on average, having more effort or thought behind it than your general interaction at the supermarket.
  I will say, it can be emotionally resonant though - but it's a borrowed property from the perception of human communication and effort that made the art the models were trained on.
- Jtarii - 8 hours ago
  
  If you want to say the complete destruction of truth is worth it because some people are having "fun" then idk.
  - joegibbs - 8 hours ago
    
    You shouldn't have believed photos since Stalin had Yezhov airbrushed out of them. The only thing that makes a photo more trustworthy than a painting is that it "looks" more real, and passes itself off as true. But there have always been photographic fakes, manipulation and curation of the photos to push a message. AI will finally end this and people will realise that the image of the thing is not the thing itself.
    
    Jtarii - 7 hours ago
    
    You are vastly, vastly underselling what is being lost. You can no longer look at a piece of art without first asking "is this even real", that is a collosal loss to the experience of being human. You can't just appreciate anything anymore without questioning it.
    >You shouldn't have believed photos since Stalin had Yezhov airbrushed out of them.
    It isn't just about propaganda photos, it is about -litearlly everything-, even things people have no incentive to fake, like cat videos, or someone doing a backflip or a video of a sunset.
    
    - 4 hours ago
    
    [deleted]
  - SpicyLemonZest - 8 hours ago
    
    I was worried about the complete destruction of truth, but it seems that's not the result of commoditized image generation. False AI-generated images have been widespread for years, and as far as I've seen, society has adapted very well to the understanding that images can't prove anything without detailed provenance. I'd argue that this has been helped, actually, by random people on the Internet routinely generating plausible images of events that obviously didn't happen.
    
    Jtarii - 7 hours ago
    
    >society has adapted very well to the understanding that images can't prove anything without detailed provenance
    Donald Trump is the president of the United States.
    
    SpicyLemonZest - 7 hours ago
    
    I don't understand the response. Do you think that Donald Trump would not be president of the United States if powerful image models hadn't been invented? Or perhaps you're referring to the AI-generated media he's often posted since being elected; when he showed a video of getting in a fighter jet to dump poo on protesters, do you think many people believed that was a real thing he actually did?
    
    Jtarii - 7 hours ago
    
    I'm more reacting to the premise that society is positively adapting to the post truth world. Which it clearly is not. Half the population of the US is already living in a fake news mirror universe where everything is inverted. More convincing fake news is not going to help.
    And this is just straight out of Putin's playbook, if everything is fake then people just stop beliving in the concept of truth altogether.
    
    SpicyLemonZest - 6 hours ago
    
    I think it's neither going to help nor hurt. My experience is that today, even people "living in a fake news mirror universe" understand that an image does not prove anything unless you can explain where you got it from and why anyone should believe it's authentic.
- tills13 - 9 hours ago
  
  The difference between "art in general" and this is scale and speed. Sure, I'll grant you that people are going to engage in deception with or without this but the barrier to entry with this is literally on the floor. Do you have a $5 prepaid VISA? You can generate whatever narrative you want in 30 seconds. Replace the $5 Prepaid VISA with the pocketbook of a three letter agency and it starts getting crazy.
  - Barbing - 8 hours ago
    
    >starts getting crazy
    Got pretty wild w/the Iranian propaganda that reportedly _resonated with Americans_ (didn't verify that claim)
    Slopaganda - https://www.newyorker.com/culture/infinite-scroll/the-team-b...
- - 9 hours ago
  
  [deleted]
- nothinkjustai - 8 hours ago
  
  Art is for the producer, and if they feel it’s necessary for them to produce it than it’s necessary for them, and what is necessary for the individual extends to the society they’re in.
atleastoptimal - 8 hours ago

The problem is I'd prefer access to near-photorealistic image gen to be commodified vs something that is restricted, as then only those willing to skirt the law or can leverage criminal networks have access to it.
primax - 7 hours ago

Every technological advance in this space has caused harm to someone.
The advent of digital systems harmed artists with developed manual artistic skills.
The availability of cheap paper harmed paper mills hand-crafting paper.
The creation of paper harmed papyrus craftsmen.
The invention of papyrus really probably pissed off those who scraped the hair off thin leather to create vellum.
My point is that in line with Jevon's paradox there is always a wave of destruction that occurs with technological transformation, but we almost always end up with more jobs created by the technology in the middle and long term.
NathanielK - 8 hours ago

Ok, but the models only know what to draw because we fed them images of dementia patients and babies.
Maybe image generators can be a loophole for consent legally, but it seems even grosser morally.
- 8 hours ago

[deleted]
ticulatedspline - 8 hours ago

Is the argument any different replacing the word "image generators" with "photoshop" ?
- Uncorrelated - 7 hours ago
  
  Scale matters. Using Photoshop took vastly more time and skill to pull off realistic images, limiting how many could be made. With image generation there's no practical limit. Some of it will be used for relatively innocuous purposes like making joke images for friends or menus for restaurants. But the floodgates are open for more socially negative uses.
  If you're the only one in the world with an internal combustion engine, the environmental impact doesn't matter at all. When they're as common as they are now, we should start thinking about large-scale effects.
- davebren - 7 hours ago
  
  It turns out that effort matters
tantalor - 8 hours ago

Prototyping. Suppose you have a hard time expressing your vision in words or executing it visually.
1. Generate 100s or 1000s of low-fidelity candidates, find something that matches your vision, iterate.
2. Hand that generated image off to a human and say, "This is what I'm thinking of, now how do we make it real?"
Important: do not skip the last step.
- apsurd - 8 hours ago
  
  You audit thousands of genAI prototype candidates?
ndriscoll - 9 hours ago

Not much beyond food, water, and shelter is "necessary" for society, but it's nice to have nice things.
I'm teaching my 4 year old to read. She likes PAW Patrol, but we've kind of exhausted the simple readers, and she likes novelty. So yesterday I had an LLM create a simple reader at her level with her favorite characters, and then turned each text block into a coloring page for her. We printed it off, she and her younger sister colored it, and we stapled it into her own book.
I could come up with 10 3 word sentences myself of course, but I'm not really able to draw well enough to make a coloring book out of it (in fact she's nearly as good as me), and it also helps me think about a grander idea to turn this into something a little more powerful that can track progress (e.g. which phonemes or sight words are mastered and which to introduce/focus on) and automatically generate things in a more principled way, add my kids into the stories with illustrations that look like them, etc.
Models will obviously become the foundation of personalized education in the future, and in that context, of course pictures (and video) will be necessary!
- drivebyhooting - 9 hours ago
  
  Repetition rather than novelty is good for learning.
  - ndriscoll - 8 hours ago
    
    Sure, and she gets that, but at some point she completely memorizes the stories. She also asks if we can get new books at the store, but they don't make 'em that fast.
    
    s4i - 2 hours ago
    
    Isn’t that also a valuable life lesson that some topics/resources are scarce and at some point you need to do something else?
- mcmcmc - 8 hours ago
  
  So the use case is just IP theft so you can get more Paw Patrol?
  AI aside, if you’ve truly exhausted all the simple readers, maybe she should move on to more advanced books instead of repeating more of the same and gamifying it, which seems a great way to destroy a child’s natural curiosity.
  - ndriscoll - 8 hours ago
    
    Sure, I don't view "IP" as valid, don't entertain the idea that it is possible to "steal" it, and absolutely don't care that someone out there might be sad imagining me making a coloring book for my kids. In fact I'd go so far as to say that holding the position that there's something wrong with tailoring teaching to a child's interests and avoiding that for fear of copyright concerns of all things actually makes you morally bad.
    You overestimate how many there are. There's like 10 stories at that level. I do also read ones with paragraphs to her, but she can't do those herself because she's 4.
    
    breezybottom - 4 hours ago
    
    Ah the old sovereign citizen reverse uno. It's actually evil NOT to use the art theft machine to dumb down your children.
  - bsenftner - 7 hours ago
    
    That is not IP theft, that's private use. If (s)he tries to sell those coloring books, that's then theft. You're free to do anything you want with IP in privacy, it's only when selling or exhibiting to the public IP law is triggered. Knock yourself out with protected IP in private.
    
    breezybottom - 4 hours ago
    
    You're thinking of fair use, and that's the worst interpretation of it I've seen.
LZ_Khan - 8 hours ago

Saving money for businesses trying to promote their products?
JumpCrisscross - 8 hours ago

> Genuine question: what positive use cases are sufficient to accept the harm from image generators?
Diagrams and maps. So much text-based communication begs for a diagram or a map.
_pdp_ - 8 hours ago

There are many use-cases outside of spam and slop.
For example, take a picture of your garden. Ask chatgpt to give you ideas how to improve it and a step by visual guide.
Anything that can be expressed visually is effectively target for this technology - this covers pretty much everything.
- never_inline - an hour ago
  
  That's a multimodal model with text output, I think GP is asking about image generators.
- kibibu - 3 hours ago
  
  Are those sufficiently valuable that the death of photographic evidence is worth it?
infecto - 9 hours ago

Could the same argument not be applied to practically everything and have drastically different perspectives from people?
stackedinserter - 8 hours ago

I have plenty for you:
- package design
- pictures for manuals and guides
- navigation and signs
- booklets, tickets and flyers
- logos of all sorts
- websites
- illustrations for books
And many. many others. Not every image is art and very few illustrators are artists.
- Jtarii - 8 hours ago
  
  So the benefits are that something that was already being mass produced with no issue is slightly easier to mass produce?
  It's not a particularly compelling argument.
  - kakapo5672 - 7 hours ago
    
    No, the benefits are that something can be mass produced magnitudes faster and easier, which in turn also creates more latitude for creativity and new spaces.
    It's a true state-change, which makes the argument pretty compelling IMO.
    
    breezybottom - 4 hours ago
    
    Weird how it's the least creative people who use it then.
- SyneRyder - 8 hours ago
  
  No idea why you were down voted, I think that's exactly how this will get used.
  I'm already imagining this is how the local live indie band night I sometimes go to will generate poster images each week for the bands that are playing, whether to put up at the venue or post to social media. And the bands might be using it to design images to put on their t-shirts and other merch. I already know some indie bands using this stuff for their album covers.
  - pesus - 7 hours ago
    
    He's getting downvoted because none of these supposed "benefits" outweigh the costs.
  - apsurd - 7 hours ago
    
    Downvotes because nobody actually wants this. Those image uses serve a purpose to an external audience. The audience doesn't want this shit.
    Now of course I'm being dramatically absolute. I'm sure I already consume these things without knowing it. These things serve a function. Offloading to AI is the implementer admitting they can't be bothered to care whether it serves the function.
- - 8 hours ago
  
  [deleted]
- pesus - 8 hours ago
  
  How do these justify the costs to society?
  - Legend2440 - 8 hours ago
    
    The 'costs to society' are massively overblown, and some of them (automating jobs) are actually benefits to society.
    
    pesus - 7 hours ago
    
    Nothing says benefiting society like increasing unemployment, destroying what little trust was left in society, and allowing for CSAM and racist propaganda to be generated en masse. At least some corporations will save a few bucks.
    
    Jtarii - 8 hours ago
    
    The girls that have to deal with their classmates generating nudes of them for the rest of time are glad to hear that their concerns are "overblown".
    
    GaryBluto - 8 hours ago
    
    Nobody tell those girls about Photoshop, or scissors and glue.
    
    Jtarii - 8 hours ago
    
    [flagged]
    
    GaryBluto - 7 hours ago
    
    There's some rich irony in accusing somebody who disagrees with you of acting in "bad faith" because you disagree with them.
    
    Jtarii - 7 hours ago
    
    Ok.
lanthissa - 8 hours ago

people pay them to use it, they find that positive
Barbing - 2 hours ago

[dead]
JimsonYang - 7 hours ago

I a 5’5” male can make myself look taller on dating apps
Short kings on tinder no more!
/s

jumploops - 3 hours ago

Looks like analog clocks work well enough now, however it still struggles with left-handed people.

Overall, quite impressed with its continuity and agentic (i.e. research) features.

samiwami - 12 hours ago

do they have anything similar to SynthID, or are they just pretending that problem doesn't exist?

I know this is probably mega cherry-picked to look more impressive, but some of the images are terrifyingly realistic. They seem to have put a lot of effort into the lighting.

alextheparrot - 11 hours ago

> Integrating an imperceptible, robust, and content-specific watermark
From the system card someone linked elsewhere in the discussion
- ai-tamer - 10 hours ago
  
  Zhao et al. 2023 showed any imperceptible watermark is provably removable by generative regeneration: pass the image through an img2img or VAE, the model reconstructs it visually identical but starts from a different latent. Watermark gone. SynthID and similar schemes do hold up well against normal sharing: recompression, crops, color tweaks, Twitter's pipeline. That covers most users. But the asymmetry is stuck — normally a GPU and a bit of motivation should be enough to strip it. Right? Got a tool to share? ;-)
ajam1507 - 6 hours ago

> do they have anything similar to SynthID, or are they just pretending that problem doesn't exist?
At least they aren't pretending that a solution exists.
losvedir - 6 hours ago

I feel like asking the image generators to mark AI images is the wrong way to go about it. It's like trying to maintain a blocklist. It seems better to me to have the major camera manufacturers or cell phones cryptographically sign their images as real.
- 93po - 5 hours ago
  
  I feel like this idea comes up often and in my opinion it doesn't solve anything. Take a picture of an AI image and you've made this approach useless. Which then goes to the argument of "well you'll see it's a picture of a picture" to which I will say there are plenty of ways to make this not appear so, and the ultimate form of this argument is that you can eventually project light directly into the photosensors, or otherwise hack the input between the photosensors and the rest of whatever digital magic that turns light into a JPG on your phone.
  - daemonologist - 4 hours ago
    
    SynthID survives basic transforms including screenshots/photos, although it can of course be defeated. Even still it helps with the laziest fakes, which there seem to be a lot of - I've seen several quite widespread misinformative images over the past couple months that failed a synthID check.
    Anyways I think approaching the problem from both directions is probably good.
swingboy - 8 hours ago

Maybe a stupid question, but does the SynthID still exist if you screenshot and crop your generated image? What if you screenshot, rotate _just_ a bit, and crop? Or apply some other effect to the image like adjusting the coloring a little bit, adding some blur, etc.
- alextheparrot - 8 hours ago
  
  The paper they published last year goes over some of these transformations: https://arxiv.org/pdf/2510.09263
Legend2440 - 12 hours ago

I think we are just going to have to accept that realistic images can be easily fabricated now.
Seeing is not believing anymore, and I don't think SynthID or anything like it can restore that trust in images.
- Barbing - an hour ago
  
  It's going to mess up accountability.
  Some politician will be recorded doing something & he'll have his people release a thousand photos/videos of him doing crimes. And they'll say, look, it's a smear campaign.
  This is just one stupid example, but people will have better schemes.
  Also global coordinated releases of fake content and hypertargeted possibly abusive content. Virtual kidnappings will take off, automated & scaled.
  - userbinator - an hour ago
    
    Some politician will be recorded doing something & he'll have his people release a thousand photos/videos of him doing crimes. And they'll say, look, it's a smear campaign.
    And his enemies will do the same, hopefully resulting in less blind trust for everyone in the population, which can only be a good thing.
- pstuart - 8 hours ago
  
  Hopefully the arms race will balance out with improved AI image detection, but I can see how that will never be guaranteed to be reliable.

lossyalgo - 8 hours ago

Someone remind me again why this is a good idea to be able to create perfect fake images?

RigelKentaurus - 11 hours ago

If every single image on their blog was generated by Images 2.0 (I've no reason to believe that's not the case), then wow, I'm seriously impressed. The fidelity to text, the photorealism, the ability to show the same character in a variety of situations (e.g. the manga art) -- it's all great!

platinumrad - 8 hours ago

Why do all of the cartoons still look like that? Genuinely asking.

modeless - 9 hours ago

Can it generate transparent PNGs yet?

alasano - 9 hours ago

Previous gpt image models could (when generating, not editing) but gpt-image-2 can't.
Noticed it earlier while updating my playground to support it
https://github.com/alasano/gpt-image-playground
- lxgr - 8 hours ago
  
  Works for me, but really weirdly on iOS: Copying to clipboard somehow seems to break transparency; saving to the iOS gallery does not. (And I’ve made sure to not accidentally depend on iOS’s background segmentation.)
vunderba - 7 hours ago

OpenAI’s API docs are frustratingly unclear on this. From my experience, you can definitely generate true transparent PNG files through the ChatGPT interface, including with the new GPT-Image-2 model, but I haven’t found any definitive way to do the same thing via the API.

mvkel - 5 hours ago

I wonder if this confirms version 1 of some kind of "world model."

It has an unprecedented ability to generate the real thing (for example, a working barcode for a real book)

vunderba - 6 hours ago

I decided to run gpt-image-2 on some of the custom comics I’ve come up with over the years to see how well it would do, since some of them are pretty unusual. Overall, I was quite impressed with how faithful it adhered to the prompts given that multi-panel stuff has to maintain a sense of continuity.

Was surprised to see it be able to render a decent comic illustrating an unemployed Pac-Man forced to find work as a glorified pie chart in a boardroom of ghosts.

https://mordenstar.com/other/gpt-2-comics

green_wheel - 3 hours ago

Well artists, you guys had a good run thank you for your service.

etothet - 9 hours ago

I would love to see prompt examples that created the images on the announcement page.

DauntingPear7 - 8 hours ago

You can by changing the view before the gallery

thevinter - 12 hours ago

Every time a new image gen comes out I keep saying that it won't get better just to be surprised again and again. Some of the examples are incredible (and incredibly scary. I feel like this is truly the point where understanding if something is AI becomes impossible)

lehmacdj - 11 hours ago

So do you think there will be a better image model in a year?
- throw310822 - 11 hours ago
  
  I'll bite: no I don't think so. If the examples are not cherry-picked and by "image model" we mean just the ability to generate pictures, this looks like parity with human excellence, there isn't much space for further improvement. The images don't just look real, they look tasteful- the model is not just generating a credible image, it's generating one that shows the talent of a good photographer/ designer/ artist.
- Vachyas - 11 hours ago
  
  I'm honestly unsure what could be improved at this point.
  Consistency? So it fails less often?
  Based on the released images, (especially the one "screenshot" of the Mac desktop) I feel like the best images from this model are so visually flawless that the only way to tell they're fake is by reasoning about the content of the image itself (ex. "Apple never made a red iPhone 15, so this image is probably fake" or "Costco prices never end in .96 so this image is probably fake")
  - thevinter - 11 hours ago
    
    There is definitely room for improvement: https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...
    Especially when it comes to detailed outputs or non-standard prompts.
    I do believe it will get even better - not sure it will happen within a year but I wouldn't be incredibly surprised if it did.
    
    Vachyas - 8 hours ago
    
    That's a good example, actually.
    If you asked me what I expected, since this one has "thinking", it'd be that it would've thought to do something like generate the image without Waldo first, then insert Waldo somewhere into that image as an "edit"
    
    vunderba - 11 hours ago
    
    Yep. “Where’s Waldo” has been a classic challenge for generative models for a while because it requires understanding the entire concept (there’s only one Waldo), while also holding up to scrutiny when you examine any individual, ordinary figure.
    I experimented with the concept of procedural generation of Waldo-style scavenger images with Flux models with rather disappointing results. (unsurprisingly).
    
    throw310822 - 11 hours ago
    
    I wonder if at this point you could just ask the agent to iteratively refine the image in smaller portions.
  - RobinL - 11 hours ago
    
    I'm been impressed when testing this model today, but it still can't consistently adhere to the following prompt: make me an image of a pizza split into 10 equal slices with space in between the them, to help teach fractions to a child.
    It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right
  - jinushaun - 10 hours ago
    
    Cost? Speed?
  - vunderba - 5 hours ago
    
    > I'm honestly unsure what could be improved at this point.
    That's because you're focusing a little bit too much on visual fidelity. It's still relatively trivial to create a moderately complex prompt and have it fail miserably.
    Even SOTA models only scored a 12 out of 15 on my benchmarks, and that was without me deliberately trying to "flex" to break the model.
    Here's one I just came up with:
    A Mercator projection of earth where the land/oceans are inverted. (aka land = ocean, and oceans = land)

- 7 hours ago

[deleted]

james2doyle - 7 hours ago

In the next round of ChatGPT advertisements, if they don’t use AI generated images, then that means they don’t believe in their own product right?

muyuu - 9 hours ago

I wonder if this will be decent at creating sprite frame animations. So far I've had very poor results and I've had to do the unthinkable and toil it out manually.

vunderba - 7 hours ago

I created this little demo of an animated sprite sheet using generative AI. It's not great, but it is passable.
https://mordenstar.com/other/hobbes-animation/
- muyuu - 6 hours ago
  
  Looks good to me. Would be nice to see the process. I'm having trouble with parts of the stride when the far leg is ahead. Doing 8-directional isometric right now.
freedomben - 9 hours ago

I had exactly the same thought! I've got a game I've been wanting to build for over a decade that I recently started working on. The art is going to be very challenging however, because I lack a lot of those skills. I am really hoping the AI tools can help with that.
Is anyone doing this already who can share information on what the best models are?
- gizmodo59 - 8 hours ago
  
  Use the imagegen skill in codex and ask it to create sprites. It works really well.
  - muyuu - 7 hours ago
    
    I didn't have great success last i tried, but i will give it another shot this week. Presumably they incorporated improvements to the skill?
  - freedomben - 7 hours ago
    
    Thank you!
ZeWaka - 8 hours ago

It's still bad.

dazhbog - 8 hours ago

Yay, let's burn the planet computing more slopium..

kanodiaayush - 9 hours ago

It stands out to me that this page itself is wonderful to go through (the telling of the product through model generated images).

minimaxir - 12 hours ago

Model card for the API endpoint gpt-image-2 (which may or may not reflect the output from ChatGPT Images 2): https://developers.openai.com/api/docs/models/gpt-image-2

API Pricing is mostly unchanged from gpt-image-1.5, the output price is slightly lower: https://developers.openai.com/api/docs/pricing

...buuuuuuuuut the price per image has changed. For a high quality image generation the 1024x1024 price has increased? That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, so assuming a typo: https://developers.openai.com/api/docs/guides/image-generati...

The submitted page is annoyingly uninformative, but from the livestream it proports the same exact features as Gemini's Nano Banana Pro. I'll run it through my tests once I figure out how to access it.

strongpigeon - 11 hours ago

> That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, [...]
I think you meant more expensive, right? Because it would make sense for it to be cheaper as there are less pixels.

JimsonYang - 7 hours ago

> you can make your own mangas

No you can’t.

You still have the studio ghibili look from the video. The issue of generating manga was the quality of characters, there’s multiple software to place your frame.

But I am hopeful. If I put in a single frame, can it carry over that style for the next images? It would be game changing if a chat could have its own art style

franze - 7 hours ago

the tragedy of image generating ai is that it is used to massively create what already exists instead of creating something truly unique - we need ai artists - and yeah, they will not be appreciated

franze - 7 hours ago

so yeah a smart move of openai would be to sponsor artists - provokant ones, junior ones, with nothing to lose - but that cell in the spreadsheet will be too small to register and will prop. never happen

fizlebit - 2 hours ago

Scrolling through those images it just feels like intellectual theft on a massive scale. The only place I think you're going to get genuinely new ideas is from humans. Whether those humans use AI or not I don't care, but the repetitive slop of AI copying the creative output of humans I don't find that interesting. Call me a curmudgeon. I guess humans also create a lot of derivative slop even without AI assistance. If this leads somehow to nicer looking user interfaces and architecture maybe that is good thing. There are a lot of ugly websites, buildings and products.

dakiol - 9 hours ago

> On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.

Yeah, agree. I think it's the first time I'm asking myself: Ok, so this new cool tech, what is it good for? Like, in terms of art, it's discarded (art is about humans), in terms of assets: sure, but people is getting tired of AI-generated images (and even if we cannot tell if an image is AI-generated, we can know if companies are using AI to generate images in general, so the appealing is decreasing). Ads? C'mon that's depressing.

What else? In general, I think people are starting to realize that things generated without effort are not worth spending time with (e.g., no one is going to read your 30-pages draft generated by AI; no one is going to review your 500 files changes PR generated by AI; no one is going to be impressed by the images you generate by AI; same goes for music and everything). I think we are gonna see a Renaissance of "human-generated" sooner rather than later. I see it already at work (colleagues writing in slack "I swear the next message is not AI generated" and the like)

lucaslazarus - 9 hours ago

> I think it's the first time I'm asking myself: Ok, so this new cool tech, what is it good for?
I feel like this is something people in the industry should be thinking about a lot, all the time. Too many social ills today are downstream of the 2000s culture of mainstream absolute technoöptimism.
Vide. Kranzberg's first law--“Technology is neither good nor bad; nor is it neutral.”
- runarberg - 8 hours ago
  
  Completely unrelated, but I am curious about your keyboard layout since you mistyped ö instead of - these two symbols are side by side in the Icelandic layout, and the ö is where - in the English (US) layout. As such this is a common type-o for people who regularly switch between the Icelandic and the English (US) layout (source: I am that person). I am curious whether more layouts where that could be common.
  - bulletsvshumans - 8 hours ago
    
    This is also a stylistic choice that the New Yorker magazine uses for words with double vowels where you pronounce each one separately, like coöperate, reëlect, preëminent, and naïve. So possibly intentional.
    
    lucaslazarus - 8 hours ago
    
    Yes, this is exactly correct, and I will die on this hill. Additionally, I don't like the way a hyphenated "techno-optimism" looks and "technOOPtimism" is a bit too on-the-nose.
    
    nullsanity - 7 hours ago
    
    [dead]
    
    runarberg - 7 hours ago
    
    That makes sense[1] but it prompts the obvious question: does this style write it as typeö then?
    1: Though personally I hate it, I just cannot not read those as completely different vowels (in particular ï → [i:] or the ee in need; ë → [je:] or the first e here; and ö → [ø] or the e in her)
    
    lucaslazarus - 6 hours ago
    
    No. Firstly because it is spelled “typo.” Secondly you typically use the diaeresis to tell the reader to not confuse it with a similarly spelled sound or diphthong. So it tells a reader that “reëlect” is not pronounced REEL-ect, “coöperate” is not COOP-uh-ray-t, and “naïve” is not NAY-v.
    
    losvedir - 6 hours ago
    
    Because written English makes so much sense normally. God forbid someone has to figure out the ambiguous pronunciation of those particular words. It seems like a silly thing to provide extra guidance on to me.
  - heisenzombie - 8 hours ago
    
    I suspect the diaresis was intentional, in “New Yorker” style.
    https://www.arrantpedantry.com/2020/03/24/umlauts-diaereses-...
lxgr - 9 hours ago

I can’t design wallpapers/stickers/icons/…, but I can describe what I want to an image generation model verbally or with a source photo, and the new ones yield pretty good results.
For icons in particular, this opens up a completely new way of customizing my home screen and shortcuts.
Not necessary for the survival of society, maybe, but I enjoy this new capability.
- latexr - 8 hours ago
  
  So we get a fresh new cheap way to spread propaganda and lies and erode trust all across society while cementing power and control for a few at the top, and in return get a few measly icons (as if there weren’t literally thousands of them freely available already) and silly images for momentaneous amusement?
  What a rotten exchange.
  - SamuelAdams - 8 hours ago
    
    I wonder what will happen to the entire legal system. It used to be fairly difficult to create convincing photos and videos.
    AI can probably fool most court judges now. Or the defense can refute legitimate evidence by saying “it’s AI / false”. How would that be refuted?
    
    jll29 - 7 hours ago
    
    Yes, that is a major worry of mine, too. CCTV evidence is worth nil now (could be generated in whole or part), and even eye-witness testimony can be trusted (sure, a witness may think they saw the alleged perpetrator, but perhaps they just saw an AI-generated video/projection of someone).
    
    BLKNSLVR - 7 hours ago
    
    MS13 was literally tattooed on his knuckles!
    
    Gigachad - 7 hours ago
    
    Multiple data sources, considering the trustworthiness of the source of the information, and accountability for lying.
    You might generate an AI video of me committing a crime, But the CCTV on the street didn't show it happening and my phone cell tower logs show I was at home. For the legal system I don't think this is going to be the biggest problem. It's going to be social media that is hit hardest when a fake video can go viral far faster than fact checking can keep up.
    
    idiotsecant - 7 hours ago
    
    By having people also testify to authenticity and coming down like the hand of God on fakers, the same way we make sure evidence is real now.
    
    gedy - 6 hours ago
    
    If it means anything, I have a 1990 Almanac from an old encyclopedia that warns the exact same thing about digital photo manipulation. I don't think it really matters at this point
  - jll29 - 7 hours ago
    
    AI can also be used to fight propaganda, for instance BiasScanner makes you aware of potentially manipulative news: https://biasscanner.org .
    So that makes AI a "dual good", like a kitchen knife: you can cut your tomato or kill you neighbor with it, entirely up to the "user". Not all users are good, so we'll see an intense amplification of both good and bad.
    
    jrumbut - 7 hours ago
    
    AI is certainly a dual good but I think the project is misguided at best.
    I put in one of the driest descriptions of the Holocaust I could find and it got a very high score for bias, calling a factual description of a massacre emotional sensationalism because it inevitably contains a lot of loaded words.
    It also doesn't differentiate between reporting, commentary, poetry, or anything else. It takes text and spits out a number, which is a very shallow analysis.
    
    dymk - 7 hours ago
    
    It's more work to fight bullshit than it is to generate it, though. Saying "Use AI to fight it" is inherently a losing strategy when the other side also has an AI that is just as powerful.
    
    jrumbut - 7 hours ago
    
    And no amount of BS detecting tells you what is true. The challenge that I see a lot of people have is they really don't have a framework to incorporate new information into.
    They're adrift, every new "fact" (whether true or false) blows them in a new direction. Often they get led in terrible directions from statements that are entirely true (but missing important context).
    A lot of financial cons work that way, a long string of true statements that seem to lead to a particular conclusion. I know that if someone is offering me 20% APY there will usually be some risk or fee that offsets those market-beating gains (it may be a worthwhile risk or a well earned fee, but that number needs to trigger further investigation).
    We need people to be equipped with that sort of framework in as many areas as possible, but we seem to be moving backwards in that area.
    
    nullsanity - 7 hours ago
    
    [dead]
  - thesmtsolver2 - 7 hours ago
    
    Don’t blame the tools. Stalin, Mao and Hitler didn’t need AI.
- camillomiller - 8 hours ago
  
  Is that worth the cost of this technology? Both in terms of financial shenanigans and its environmental cost?
  - subroutine - 8 hours ago
    
    Are you asking if the 10 seconds it takes AI to generate an image is more costly to the environment than a commissioned graphics artist using a laptop for 5-6 hours, or a painter who uses physical media sourced from all over the world?
    
    bayindirh - 7 hours ago
    
    In short, yes.
    A modern laptop is running almost fanless, like a 486 from the days of yore.
    A single H200 pumps out 700W continuously in a data center, and you run thousands of them.
    Also, don't forget the training and fine tuning runs required for the models.
    Mass transportation / global logistics can be very efficient and cheap.
    Before the pandemic, it was cheaper to import fresh tomatoes from half-world away rather than growing them locally in some cases. A single container of painting supplies is nothing in the grand scheme of things, esp. when compared with what data centers are consuming and emitting.
    
    ToValueFunfetti - 7 hours ago
    
    This is a plainly dishonest comparison. A single H200 does not need to run continuously for you to generate a dozen pictures. And then you immediately pivot to comparing the paint usage against "the grand scheme of things"- 700W is nothing in the grand scheme of things.
    
    bayindirh - 4 minutes ago
    
    In fact it's pretty fair.
    Many people think that when a piece of hardware is idle, its power consumption becomes irrelevant, and that's true for home appliances and personal computers.
    However, the picture is pretty different for datacenter hardware.
    Looking now, an idle V100 (I don't have an idle H200 at hand) uses 40 watts, at minimum. That's more than TDP of many, modern consumer laptops and systems. A MacBook Air uses 35W power supply to charge itself, and it charges pretty quickly even if it's under relatively high stress.
    I want to clarify some more things. A modern GPU server houses 4-8 high end GPUs. This means 3KW to 5KW of maximum energy consumption per server. A single rack goes well around 75KW-100KW, and you house hundreds of these racks. So, we're talking about megawatts of energy consumption. CERN's main power line on the Swiss side had a capacity around 10MW, to put things in perspective.
    Let's assume an H200 uses 60W energy when it's idle. This means ~500W of wasted energy per server for sitting around. If a complete rack is idle, it's 10KW. So you're wasting energy consumption of 3-5 houses just by sitting and doing nothing.
    This computation only thinks about the GPU. Server hardware also adds around 40% to these numbers. Go figure. This is wasting a lot for cat pictures.
    And, these "small" numbers add up to a lot.
    
    cpill - 7 hours ago
    
    these are unfair comparisons. it's not just a single laptop running all day it's all the graphic designer laptops that get replaced. it's not a single container of painting supplies it's all off them, (which are toxic by the way).
    so if power were plentiful and environmental you'd be onboard with it?
    
    camillomiller - 19 minutes ago
    
    Wow, do you hold a degree in false dichotomies?
    
    dilDDoS - 8 hours ago
    
    Cheaper/faster tech increases overall consumption though. Without the friction of commissioning a graphics artist to design something, a user can generate thousands of images (and iterate on those images multiple times to achieve what they want), resulting in way more images overall.
    I'm not really well versed on the environmental cost, more just (neutrally) pointing out that comparing a single 10s image to a 5-6 hour commission ignores the fact that the majority of these images probably would never have existed in the first place without AI.
    
    runarberg - 8 hours ago
    
    Also, ignoring training when talking about the environmental costs is bad faith. Without training this image would not exist, and if nobody generating images like these, the training would not happen. So we should really ask, the 10 seconds it took for inference, plus the weeks or months of high intensity compute it took to train the model.
    
    ToValueFunfetti - 6 hours ago
    
    You'd want to compare against the fraction of training attributable to the image
    
    - 3 hours ago
    
    [deleted]
  - Legend2440 - 8 hours ago
    
    The environmental cost is significantly overblown, especially water usage.
    
    bayindirh - 8 hours ago
    
    I work with direct liquid cooled systems. If the datacenter is working with open DLC systems (most AI datacenters in the US in fact do), there's a lot of water is being wasted, 7/24/365.
    A mid-tier top-500 system (think about #250-#325) consumes about a 0.75MW of energy. AI data centers consume magnitudes more. To cool that behemoth you need to pump tons of water per minute in the inner loop.
    Outer loop might be slower, but it's a lot of heated water at the end of the day.
    To prevent water wastage, you can go closed loop (for both inner and outer loops), but you can't escape the heat you generate and pump to the atmosphere.
    So, the environmental cost is overblown, as in Chernobyl or fallout from a nuclear bomb is overblown.
    So, it's not.
    
    Legend2440 - 8 hours ago
    
    It's not that it doesn't use water; it's that water is not scarce unless you live in a desert.
    As a country, we use 322 billion gallons of water per day. A few million gallons for a datacenter is nothing.
    
    bayindirh - 8 hours ago
    
    The problem is you don't just use that water and give it back.
    The water gets contaminated and heated, making it unsuitable for organisms to live in, or to be processed and used again.
    In short, when you pump back that water to the river, you're both poisoning and cooking the river at the same time, destroying the ecosystem at the same time too.
    Talk about multi-threaded destruction.
    
    Legend2440 - 8 hours ago
    
    No, you're making that up. Datacenters do not poison rivers.
    
    bayindirh - 8 hours ago
    
    To reiterate, I work in a closed loop DLC datacenter.
    Pipes rust, you can't stop that. That rust seeps to the water. That's inevitable. Moreover, if moss or other stuff starts to take over your pipes, you may need to inject chemicals to your outer loop to clean them.
    Inner loops already use biocides and other chemicals to keep them clean.
    Look how nuclear power plants fight with organism contamination in their outer cooling loops where they circulate lake/river water.
    Same thing.