Nano Banana image examples
github.com538 points by SweetSoftPillow 2 days ago
538 points by SweetSoftPillow 2 days ago
Nano-Banana can produce some astonishing results. I maintain a comparison website for state-of-the-art image models with a very high focus on adherence across a wide variety of text-to-image prompts.
I recently finished putting together an Editing Comparison Showdown counterpart where the focus is still adherence but testing the ability to make localized edits of existing images using pure text prompts. It's currently comparing 6 multimodal models including Nano-Banana, Kontext Max, Qwen 20b, etc.
https://genai-showdown.specr.net/image-editing
Gemini Flash 2.5 leads with a score of 7 out of 12, but Kontext comes in at 5 out of 12 which is especially surprising considering you can run the Dev model of it locally.
> a very high focus on adherence
Don't know if it's the same for others, but my issue with Nano Banana has been the opposite. Ask it to make x significant change, and it spits out what I would've sworn is the same image. Sometimes randomly and inexplicably it spits our the expected result.
Anyone else experiencing this or have solutions for avoiding this?
Just yesterday, asking it to make some design changes to my study. It did a great job with all the complex stuff, but asking it to move a shelf higher, it repeatedly gave me back the same image. With LLMs generally I find as soon as you encounter resistance it's best to start a new chat, however in this case that didn't wok either. Not a single thing I could do to convince it that the shelf didn't look right half way up a wall.
"Hey gemini, I'll pay you a commission of $500 if you edit this image with the shelf higher on the wall..."
Yeah I've definitely seen this. You can actually see evidence of this problem in some of the trickier prompts (the straightened Tower of Pisa and the giraffe for example).
Most models (gpt-image-1, Kontext, etc) typically fail by doing the wrong thing.
From my testing this seems to be a Nano-Banana issue. I've found you can occasionally work around it by adding far more explicit directives to the prompt but there's no guarantee.
Great comparison! Bookmarked to follow. Keep an eye on Grok, they're improving at a very rapid rate and I suspect they'll be near the top in not too distant future.
Nice visualization!
By the way, some of the results look a little weird to me, like the one for the 'Long Neck' prompt. The giraffe of Seedream just lowered its head but its neck didn't shorten as expected. I'd like to learn about the evaluation process, especially whether it is automatic or manual.
Hi Isharmla, the giraffe one was a tough call. IMHO, even when correcting for perspective, I do feel like it managed to follow the directive of the prompt and shorten the neck.
To answer your question, all of the evaluations are performed manually. On the trickier results I'll occasionally conscript some friends to get a group evaluation.
The bottom section of the site has an FAQ that gives more detail, I'll include it here:
It's hard to define a discrete rubric for grading at an inherently qualitative level. To keep things simple, this test is purely PASS/FAIL - unsuccessful means that the model NEVER managed to generate an image adhering to the prompt.
In many cases, we often attempt a generous interpretation of the prompt - if it gets close enough, we might consider it a pass.
To paraphrase former Supreme Court Justice Potter Stewart, "I may not be able to define a passing image, but I know it when I see it."
still cannot show clock (eg a clock showing 1:15 am). the text generated in manga image is still not 100% correct.
Add gpt-image-1. It's not strictly an editing model since it changes the global pixels, but I've found it to be more instructive than Nano Banana for extremely complicated prompts and image references.
It's actually already in there - the full list of edit models is Nano-Banana, Kontext Dev, Kontext Max, Qwen Edit 20b, gpt-image-1, and Omnigen2.
I agree with your assessment - even though it does tend to make changes at a global level you can least attempt to minimize its alterations through careful prompting.
Amazing model. The only limit is your imagination, and it's only $0.04/image.
Since the page doesn't mention it, this is the Google Gemini Image Generation model: https://ai.google.dev/gemini-api/docs/image-generation
Good collection of examples. Really weird to choose an inappropriate for work one as the second example.
More specifically, Nano Banana is tuned for image editing: https://gemini.google/overview/image-generation
Yep, Google actually recommends using Imagen4 / Imagen4 Ultra for straight image generation. In spite of that, Flash 2.5 still scored shockingly high on my text-to-image comparisons though image fidelity is obviously not as good as the dedicated text to image models.
Came within striking distance of OpenAI gpt-image-1 at only one point less.
[misread]
They're referring to Case 1 Illustration to Figure, the anime figurine dressed in a maid outfit in the HN post.
I assume OP means the actual post.
The second example under "Case 1: Illustration to Figure" is a panty shot.
This was reported and has been removed recently (https://github.com/PicoTrex/Awesome-Nano-Banana-images/issue...), although the issue wasn't closed.
For anyone confused, the offending example got removed 10 minutes ago
https://github.com/PicoTrex/Awesome-Nano-Banana-images/tree/... if you want to see it.
I have no idea how people think they can interact with an art related product with this kind of puritanical sensibility.
This is the first time I really don't understand how people are getting good results. On https://aistudio.google.com with Nano Banana selected (gemini-2.5-flash-image-preview) I get - garbage - results. I'll upload a character reference photo and a scene and ask Gemini to place the character in the scene. What it then does is to simply cut and paste the character into the scene, even if they are completely different in style, colours, etc.
I get far better results using ChatGPT for example. Of course, the character seldom looks anything like the reference, but it looks better than what I could do in paint in two minutes.
Am I using the wrong model, somehow??
No, I've noticed the same.
When Nano Banana works well, it really works -- but 90% of the time the results will be weird or of poor quality, with what looks like cut-and-paste or paint-over, and it also refuses a lot of reasonable requests on "safety" grounds. (In my experience, almost anything with real people.)
I'm mostly annoyed, rather than impressed, with it.
Ok this answers my question to the nature of the page. As in: Are these examples that show results you get when using certain inputs and prompts. Or are these impressive lucky on offs.
I was a bit surprised to see quality. Last time I played around with image generation is a few months back and I’m more in the frustration camp. Not to say that I believe some people with more time and dedication at their hand can tickle better results.
There's a good reference up in the comments: https://genai-showdown.specr.net/image-editing
which goes to show that some of these amazing results might need 18 attempts and such.
In my experience, Nano Banana would actively copy and paste if it thinks it's fine to do so. You need to explicitly prompt that the character should be seamlessly integrated into the scene or similar. In the other words, the model is superb when properly prompted especially compared to other models, but prompting itself can be annoying from time to time.
Play around with your prompt, try ask Gemini 2.5 pro to improve your prompt before sending it to Gemini 2.5 Flash, retry and learn what works and what doesn't.
+1
I understand the results are non deterministic but I get absolute garbage too.
Uploaded pics of my (32 years old) wife and we wanted to ask it to give her a fringe/bangs to see how would she look like it either refused "because of safety" and when it complied results were horrible, it was a different person.
After many days and tries we got it to make one but there was no way to tweak the fringe, the model kept returning the same pic every time (with plenty of "content blocked" in between).
Are you in gemini.google.com interface? If so, try Google AI Studio instead, there you can disable safety filters.
Seedream 4.0 is not always better than Gemini Flash 2.5 (nano-banana), but when it is better, there is a gulf in performance (and when it's not, it's very close.)
It's also cheaper than Gemini, and has way fewer spurious content warnings, so overall I'm done with Gemini
No, that's just result of TONS of resets until you get something decent. 99% of the time you'll get trash, but that 1% is cool
It's not just you and there's a ton of gaslighting and astroturfing happening with Nano Banana. Thanks to this article we can even attempt to reproduce their exact inputs and lo and behold the results are much worse. I tried a bunch of them and got far worse results than the author. I assume they are trying the same prompts again and again until they get something slightly useful.
Well it's good to see they are showcasing examples where the model really fails too.
- The second one in case 2 doesn't look anything like the reference map
- The face in case 5 changes completely despite the model being instructed to not do that
- Case 8 ignores the provided pose reference
- Case 9 changes the car positions
- Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is
- Case 27 shows the usual "models can't do text" though I'm not holding that against it too much
- Same with case 29, as well as the text that is readable not relating to the parts of the image it is referencing
- Case 33 just generated a generic football ground
- Case 37 has nonsensical labellings ("Define Jawline" attached to the eye)
- Case 58 has the usual "models don't understand what a wireframe is", but again I'm not holding that against it too much
Super nice to see how honest they are about the capabilities!
> - Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is
> - Case 27 shows the usual "models can't do text" though I'm not holding that against it too much
16 makes it seem like it can "do text" — almost, if we don't care what it says. But it looks very crisp until you notice the "Pul??nary Artereys".
I'd say the bigger problem with 27 is that asking to add a watermark also took the scroll out of the woman's hands.
(While I'm looking, 28 has a lot of things wrong with it on closer inspection. I said 26 originally because I randomly woke up in the middle of the night for this and apparently I don't know which way I'm scrolling.)
EDIT: Yeah, on closer inspection, 28 is definitely a bit screwy. I wasn't clicking on the images themselves to view the enlarged ones, and from the preview I didn't see anything that immediately jumped out at me. I have no idea what that line at the bottom is meant to represent!
Also you're right, I didn't notice the scroll had gone, though on another inspection, it's also removed the original prompter's watermark
I recently released a Python package for easily generating images with Nano Banana: https://github.com/minimaxir/gemimg
Through that testing, there is one prompt engineering trend that was consistent but controversial: both a) LLM-style prompt engineering with with Markdown-formated lists and b) old-school AI image style quality syntatic sugar such as award-winning and DSLR camera are both extremely effective with Gemini 2.5 Flash Image, due to its text encoder and larger training dataset which can now more accurately discriminate which specific image traits are present in an award-winning image and what traits aren't. I've tried generations both with and without those tricks and the tricks definitely have an impact. Google's developer documentation encourages the latter.
However, taking advantage of the 32k context window (compared to 512 for most other models) can make things interesting. It’s possible to render HTML as an image (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...) and providing highly nuanced JSON can allow for consistent generations. (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...)
Unfortunately NSFW in parts. It might be insensitive to circulate the top URL in most US tech workplaces. For those venues, maybe you want to pick out isolated examples instead.
(Example: Half of Case 1 is an anime/manga maid-uniform woman lifting up front of skirt, and leaning back, to expose the crotch of underwear. That's the most questionable one I noticed. It's one of the first things a visitor to the top URL sees.)
I’m Italian, and I really struggle to rationalize this attitude. I honestly don’t understand. Maybe it’s because I’m surrounded by 2,500 years of art in which nudity is an essential and predominant element, by people (even in the workplace) who have a relaxed and genuinely democratic view of the subject — but this comment feels totally alien to me. I suppose it’s my own limitation, but I would NEVER have focused attention on this aspect. I don’t know, maybe I’m the one who’s wrong…
Italy obviously has much rich and beautiful culture, though I don't know it well enough to understand the difference on this point. Does my response to someone else clarify how and why US corporate culture may be different?
As a non-US citizen - even though I've been the only Brit in remote teams of Americans - I find this really hard to make sense of.
At least in the UK, if I saw this loaded on someone else's screen at work, I might raise an eyebrow initially, but there wouldn't be any consequences that don't first consider context. As soon as the context is provided ("it's comparing AI models, look! Cool, right?!") everyone would get on with their jobs.
What would be the consequence of you viewing this at work?
How would the situation be handled?
Is the problem a HR thing - like, would people get sacked for this? Or is it like a personal conduct/temptation, that colleagues who see it might not be able to restrain themselves or something?
I think it’s mostly puritanical bigotry.
Understanding the complex dynamics that strengthen relationships or weaken men’s resolve for commitment may be enlightening.
I think one part of it (not all of it) is that the US has a long history of women being sexually harassed in the workplace, in various ways. It's not nearly as bad as it used to be, but it's not fully solved everywhere.
(Note: Statements suggesting that sexual harassment exists at all make some people on the Internet flip out angrily, but I interpret your questions as in good faith, and I'm trying to answer in good faith.)
One example of why that that harassment context is relevant: if you were a woman, wouldn't you think it was insensitive for a male colleague to send you an image that was obviously designed to be sexually suggestive, and with the female as the sex object? Is he consciously harassing you, or just being oblivious to why this is inappropriate?
For a separate reason that this is a problem in the workplace: besides the real impact to morale and how colleagues respect each other, even the most sociopathic US companies want to avoid sexual harassment lawsuits and public scandals.
For reasons like these, and others, if someone, say, posted that isolated maid image to workplace chat, then I think there's a good chance that a manager or HR would say something to the employee if they found out, and/or (without directly referring to that incident) communicate to everyone about appropriate practices.
But if there was a pattern of insensitive/oblivious/creepy behavior by this employee, or if someone complained to manager/HR about the incident, or if there was legal action against the company (regarding this incident, or a different sexual harassment situation), then I guess the employee might be terminated.
If I were a manager in a company, and one of my reports posted an image like this, I'd probably say something quietly to them, and much more gently than the above (e.g., "Uh, that image is a bit in a direction we want to stay away from in the office", or maybe even just the slightest concerned glance), and most people would get it. Just a little learning moment, like we all have many of. But if there were a trickier situation, or I was under orders, I might have to ask HR about it (and if I did, hopefully that particular HR person is helpful, and that particular company is reasonable).
I'm really surprised that it can generate the underwear example. Last time I tried Nano Banaba (with safety filter 'off', whatever it means), it refused to generate a 'cursed samurai helmet on an old wooden table with a bleeding dead body underneath, in cartoon style.'
Edit: It still blocks this request.
I'm more bothered by the fact that this reference image is clearly a well-made piece of digital art by some artist.
We all know the questionable nature of AI/LLM models, but people in the field usually at least try to avoid directly using other people's copyrighted material in documentation.
I'm not even talking about legality here. It just feels morally wrong to so blatantly use someone else's artwork like this.
I agree that proper permission should be used for these examples, but I’m quite sure the image in question is AI generated. The quality is incredible these days as to what can be generated, and even to a trained eye it’s getting more difficult by the day to tell if its AI or not.
Source of artist: https://x.com/curry3_aiart/status/1947416300822638839
The reference is AI-generated too. This comment shows how people are susceptible to our existing bias.
My favorite (or should I say, anti-favorite?) is calling real artists' art AI, which I'm starting to see more and more of, and I've already seen a couple of artists rage-quit social media because of the anti-AI crowd's abuse.
Yeah that's bad too, but what the parent comment did was the opposite: calling an AI-generated image "clearly a well-made piece of digital art by some artist."
It boils down to the same thing - it's getting harder to distinguish AI generated art from non-AI art, and since the models are constantly getting better it's only going to get worse.
Personally, I'm underwhelmed by this model. I feel like these examples are cherry-picked. Here are some fails I've had:
- Given a face shot in direct sunlight with severe shadows, it would not remove the shadows
- Given an old black and white photo, it would not render the image in vibrant color as if taken with a modern DSLR camera. It will colorize the photo, but only with washed out, tinted colors
- When trying to reproduce the 3 x 3 grid of hair styles, it repeatedly created a 2x3 grid. Finally, it made a 3x3 grid, but one of the nine models was black instead of caucasian.
- It is unable to integrate real images into fabricated imagery. For example, when given an image of a tutu and asked to create an image of a dolphin flying over clouds wearing the tutu, the result looks like a crude photoshop snip and copy/paste job.
I thought the the 3rd example of the AR building highlighting was cool. I used the same prompt and seems to work when you ask it for the most prominent building in a skyline, but fails really hard if you ask it for another building.
I uploaded an image I found of Midtown Manhattan and tried various times to get it to highlight the Chrysler Building, it claimed it wasn't in the image (it was). I asked it to do 432 Park Ave, and it literally inserted a random building in the middle of the image that was not 432 Park, and gave me some garbled text for the description. I then tried Chicago as pictured from museum campus and asked it to highlight 2 Prudential, and it inserted the Hancock Center, which was not visible in the image I uploaded, and while the text was not garbled, was incorrect.
Even these examples aren't perfect.
The "Photos of Yourself in Different Eras" one said "Don't change the character's face" but the face was totally changed. "Case 21: OOTD Outfit" used the wrong camera. "Virtual Makeup Try-On" messed up the make up. "Lighting Control" messed up the lighting, the joker minifig is literally just SH0133 (https://www.bricklink.com/catalogItemInv.asp?M=sh0133), "Design a Chess Set" says you don't need an input image, but the prompt said to base it off of a picture that wasn't included and the output is pretty questionable (WTF is with those pawns!), etc.
I mean, it's still pretty neat, and could be useful for people without access to photoshop or to get someone started on a project to finish up by hand.
> I feel like these examples are cherry-picked
I don't know of a demo, image, film, project or whatever where the showoff pieces are not cherry picked.
This is amazing. Not that long ago, even getting a model to reliably output the same character multiple times was a real challenge. Now we’re seeing this level of composition and consistency. The pace of progress in generative models is wild.
Huge thanks to the author (and the many contributors) as well for gathering so many examples; it’s incredibly useful to see them to better understand the possibilities of the tool.
I've come to realize that I liked believing that there was something special about the human mental ability to use our mind's eye and visual imagination to picture something, such as how we would look with a different hairstyle. It's uncomfortable seeing that skill reproduced by machinery at the same level as my own imagination, or even better. It makes me feel like my ability to use my imagination is no more remarkable than my ability to hold a coat off the ground like a coat hook would.
As someone who can’t visualize things like this in my head, and can only think about them intellectually, your own imagination is still special. When I heard people can do that, it sounded like a super power.
AI is like Batman, useless without his money and utility belt. Your own abilities are more like Superman, part of who you are and always with you, ready for use.
But you can find joy at things you envision, or laugh, or be horrified. The mental ability is surely impressive, but having a reason to do it and feeling something at the result is special.
"To see a world in a grain of sand And a heaven in a wild flower..."
We - humans - have reasons to be. We get to look at a sunset and think about the scattering of light and different frequencies and how it causes the different colors. But we can also just enjoy the beauty of it.
For me, every moment is magical when I take the time to let it be so. Heck, for there to even be a me responding to a you and all of the things that had to happen for Hacker News to be here. It's pretty incredible. To me anyway.
I have aphantasia, I’m glad we’re all on a level playing field now.
I always thought I had a vivid imagination. But then the aphantasia was mentioned in Hello Internet once, I looked it up, see comments like these and honestly…
I’ve no idea how to even check. According to various tests I believe I have aphantasia. But mostly I’ve got not even a slightest idea on how not having it is supposed to work. I guess this is one of those mysteries when a missing sense cannot be described in any manner.
A simple test for aphantasia that I gave my kids when they asked about it is to picture an apple with three blue dots on it. Once you have it, describe where the dots are on the apple.
Without aphantasia, it should be easy to "see" where the dots are since your mind has placed them on the apple somewhere already. Maybe they're in a line, or arranged in a triangle, across the middle or at the top.
When reading "picture an apple with three blue dots on it", I have an abstract concept of an apple and three dots. There's really no geometry there, without follow on questions, or some priming in the question.
In my conscious experience I pretty much imagine {apple, dot, dot, dot}. I don't "see" blue, the dots are tagged with dot.color == blue.
When you ask about the arrangement of the dots, I'll THEN think about it, and then says "arranged in a triangle." But that's because you've probed with your question. Before you probed, there's no concept in my mind of any geometric arrangement.
If I hadn't been prompted to think / naturally thought about the color of the apple, and you asked me "what color is the apple." Only then would I say "green" or "red."
If you asked me to describe my office (for example) my brain can't really imagine it "holistically." I can think of the desk and then enumerate it's properties: white legs, wooden top, rug on ground. But, essentially, I'm running a geometric iterator over the scene, starting from some anchor object, jumping to nearby objects, and then enumerating their properties.
I have glimpses of what it's like to "see" in my minds eye. At night, in bed, just before sleep, if I concentrate really hard, I can sometimes see fleeting images. I liken it to looking at one of those eye puzzles where you have to relax your eyes to "see it." I almost have to focus on "seeing" without looking into the blackness of my closed eyes.
Exactly my experience too. These fleeting images are rare, but bloody hell it feels like cheating at life if most people can summon up visualisations like that at will.
Watching someone clearly just transfer what's in their mind to a drawing is just jaw-dropping to me.
Like they'll start at an arm and move along filling the rest of the body correctly the first time. No sketching, no finding the lines, just a human printer.
I think I have it as well. But my theory is that we might have imagination but it is only accessible to subconscious. It is as if it is blocked from consciousness. I have ADHD as well, might be that this is protection mechanism that allows my kind of brain to survive in the world better (otherwise it would be too entertaining to get lost in your own imagination). As a kid I used to daydream a lot.
I've come to realize that's how they all are.
No one really sees 3d pictures in their head in HD
I can see I my head with ~80% the level as seeing with my eyes. It's a little tunnel visiony and fine details can be blurry, but I can definitely see it. A honeycrisp apple on a red woven placemat on a wooden counter top. The blue dots are the size of peas, they are stickers in a triangle.
It not just images either, it's short videos.
What's interesting though is that the "video" can be missing details that I will "hallucinate" back in that will be incorrect. So I cannot always fully trust these. Like cutting the apple in half lead to a ~1/8th slice missing from one of the halves. It's weird.
I'm a 5 on the VVIQ. I can see the 3D apple, put it in my hand, rotate it, watch the light glint on the dimples in the skin, imagine tossing it to a close friend and watch them catch it, etc.
It's equally astonishing to me that others are different.
You close your eyes and see exactly what you would on a TV with your eyes open?
I don't need to close my eyes, it doesn't make much of a difference, and I see what my eyes would see. It doesn't look like a TV unless I imagine a TV and put the image on the screen.