Nano Banana Pro
blog.google768 points by meetpateltech 9 hours ago
768 points by meetpateltech 9 hours ago
Google has been stomping around like Godzilla this week, and this is the first time I decided to link my card to their AI studio.
I had seen people saying that they gave up and went to another platform because it was "impossible to pay". I thought this was strange, but after trying to get a working API key for the past half hour, I see what they mean.
Everything is set up, I see a message that says "You're using Paid API key [NanoBanano] as part of [NanoBanano]. All requests sent in this session will be charged." Go to prompt, and I get a "permission denied" error.
There is no point in having impressive models if you make it a chore for me to -give you my money-
First off, apologies for the bad first impression, the team is pushing super hard to make sure it is easy to access these models.
- On permission issue, not sure I follow the flow that got you there, pls email me more details if you are able too and happy to debug: Lkilpatrick@google.com
- On overall friction for billing: we are working on a new billing experience built right into AI Studio that will make it super easy to add a CC and go build. This will also come along with things like hard billing caps and such. The expected ETA for global rollout is January!
Just a note that your HN bio says "Developer Relations @OpenAI"
Sure it will get updated to same as Linkedin - Helping developers build with AI at Google DeepMind.
Imagine many on here have out of date bio's and best part - it don't matter, but sure can make some funnies at times.
I was interested. I does look like he just needs to update that. His personal blog says google, and ex-openAI. But I do feel like I have my tin foil on every time I come to HN now.
The new releases this week baited me into business ultra subscription. Sadly it’s totally useless for gemini 3 cli and now also nano banana does not work. Just wow.
I bought a Pro subscription (or the lowest tier paid plan, whatever it's called), and the fact that I had to fill out a Google Form in order to request access to get Gemini 3 CLI is an absolute joke. I'm not even a developer, I'm a UX guy who just likes playing around with seeing how models deal with importing Figma screens and turn them into a working website. Their customer experience is shockingly awful, worse than OpenAI and Anthropic.
Oh man, there is so, so much pain here. Random example - if GOOGLE_GENAI_USE_VERTEXAI=true in your environment, woe betide you if you're trying to use gemini cli with an API key. Error messages don't match up with actual problems, you'll be told to log in using the cli auth for google, then you'll be told your API keys have no access.. It's just a huge mess. I still don't really know if I'm using a vertex API key or a non-vertex one, and I don't want to touch anything since I somehow got things running..
Anyway vai com dios, I know that there's a fundamental level of complexity deploying at google, and deploying globally, but it's just really hard compared to some competitors. Sadly, because the gemini series is excellent!
Maybe the team should push hard before releasing the product instead of after to make it work.
But then we'd complain about Google being a slow moving dinosaur.
"Move fast and break things" cuts both ways !
(ex-Google tech lead, who took down the Google.com homepage... twice!)
Its not a new problem though, and its not just billing. The UI across Gemini just generally sucks (across AI Studio and the chat interfaces) and there's lots of annoying failure cases where Gemini will just timeout and stop working entirely midrequest.
Been like this for quite a while, well before Gemini 3.
So far I continue to put up with it because I find the model to be the best commercial option for my usage, but its amazing how bad modern Google is at just basic web app UX and infrastructure when they were the gold standard for such for like, arguably decades prior.
Imagining the counterfactual (“typical, the most polished part of this service is the payment screen!”), it seems hard to win here.
That’s a pretty uncharitable take. Given the scale of their recent launches and amount of compute to make them work, it seems incredibly smooth. Edge cases always arise, and all the company/teams can really do is be responsive - which is exactly why I see happening.
Why should the scale of their recent launches be a given? Who is requiring this release schedule?
the market
If it's a strategic decision, then its impacts should be weighed in full. Not just the positives.
We're talking about Google right? You think they need a level of charity for a launch? I've read it all at this point.
Please make sure that the new billing experience has support for billing limits and prepaid balance (to avoid unexpected charges)!
Lol. Since the GirlsGoneWild people pioneered the concept of automatically-recurring subscriptions, unexpected charges and difficult-to-cancel billing is the game. The best customer is always the one that pays but never uses the service ... and ideally has forgotten or lost access to the email address they used when signing up.
I had pretty much written off ever my credit card to Google, but a better billing experience and hard billing caps might change that.
The fact that your team is worrying about billing is...worrying. You guys should just be focused on the product (which I love, thanks!)
Google has serious fragmentation problems, and really it seems like someone else with high rank should be enforcing (and have a team dedicated to) a centralized frictionless billing system for customers to use.
I had the same reaction as them many months ago, the Google Cloud and Vertex AI stuff namespacing is a too messy. The different paths people might take to learning and trying to use the good new models needs properly mapping out and fixing so that the UX makes sense and actually works as they expect.
I super wish all the super annoying tech super nerds would super quickly disappear and make the world a super better place.
Google APIs in general are hilariously hard to adopt. With any other service on the planet, you go to a platform page, grab an api key and you’re good to go.
Want to use Google’s gmail, maps, calendar or gemini api? Create a cloud account, create an app, enable the gmail service, create an oauth app, download a json file. Cmon now…
If it's just the API you're interested in, Fal.ai has put Nano-Banana-Pro up for both generative and editing. A great deal less annoying to sign up for them since they're a pretty generalized provider of lots of AI related models.
In general a better option, in the early days of AI video I tried to generate a video of a golden retriever using Google's AI Studio. It generated 4 in the highest quality and charged me 36 bucks. Not a crazy amount but definitely an unwelcome suprise.
Fal.ai is pay as you go and has the cost right upfront.
Vertex AI Studio setting a default of 4 videos where each video is several dollars to generate is a very funny footgun.
Is there a model on Fal.ai that would make it easy to sharpen blurry video footage? I have found some websites, but apparently they are mostly scammy.
Unfortunately, this is a fairly difficult task. In my experience, even SOTA models like Nano Banana usually make little to no meaningful improvement to the image when given this kind of request.
You might be better off using a dedicated upscaler instead, since many of them naturally produce sharper images when adding details back in - especially some of the GAN-based ones.
If you’re looking for a more hands-off approach, it looks like Fal.ai provides access to the Topaz upscalers:
FYI that is an extremely challenging thing to do right. Especially if you care about accuracy and evidentiary detail. Not sure this is something that the current crop of AI tools are really tuned to do properly.
There's the solution right there. Google is still growing its AI "sea legs". They've turned the ship around on a dime and things are still a little janky. Truly a "startup mode" pivot.
While we're on this subject of "Google has been stomping around like Godzilla", this is a nice place to state that I think the tide of AI is turning and the new battle lines are starting to appear. Google looks like it's going to lay waste to OpenAI and Anthropic and claim most of the market for itself. These companies do not have the cash flow and will have to train and build their asses off to keep up with where Google already is.
gpt-image-1 is 1/1000th of Nano Banana Pro and takes 80 seconds to generate outputs.
Two years ago Google looked weak. Now I really want to move a lot of my investments over to Google stock.
How are we feeling about Google putting everyone out of work and owning the future? It's starting to feel that way to me.
(FWIW, I really don't like how much power this one company has and how much of a monopoly it already was and is becoming.)
Valid questions, but I'd say that it's hard to know what the future holds when we get models that push the state of the art every few months. Claude sonnet 3.7 was released in February of this year. At the rate of change we're going, I wouldn't be surprised if we end up with Sonnet 5 by March 2026.
As others have noted, Google's got a ways to go in making it easier to actually use their models, and though their recent releases have been impressive, it's not clear to me that the AI product category will remain free from the bad, old fiefdom culture that has doomed so many of their products over the last decade.
We can't help but overreact to every new adjustment on the leader boards. I don't think we're quite used to products in other industries gaining and losing advantage so quickly.
This is also my take on the market, although I also thought it looked like they were going to win 2 years ago too.
> How are we feeling about Google putting everyone out of work and owning the future? It's starting to feel that way to me.
Not great, but if one company or nation is going to come out on top in AI then every other realistic alternative at the moment is worse than Google.
OpenAI, Microsoft, Facebook/Meta, and X all have worse track records on ethics. Similarly for Russia, China, or the OPEC nations. Several of the European democracies would be reasonable stewards, but realistically they didn't have the capital to become dominant in AI by 2025 even if they had started immediately.
100% this. I am using the pro/max plans on both claude and openai. Would love to experiment with gemini but paying is next to impossible. Why do i need the risk of a full blown gcp project just to test gemini. No thx.
Ha, I have been steeling myself for a long chat with Claude about “how the F to get AI Studio up and working.” With paying being one of the hardest parts.
Without a doubt one essential ingredient will be, “you need a Google Project to do that.” Oh, and it will also definitely require me to Manage My Google Account.
Easiest way is to go https://aistudio.google.com/api-keys set up an api key and add your billing to it.
It's amazing that the "hard problems" are turning out to be "not creating a completely broken user experience".
Is that going to need AGI? Or maybe it will always be out of reach of our silicon overlords and require human input.
Oh my, you should have tried to integrate with Google Prism. That was a madness! Nano Banana was just a little tricky to set up in comparison!
You can use it also in Gemini.
It wasn't there when I first went to Gemini after the announcement, but upon revisiting it gave me the prompt to try Nano Banana Pro. It failed at my niche (rare palm trees).
Incredible technology, don't get me wrong, but still shocked at the cumbersome payment interface and annoyed that enabling Drive is the only way to save.
I hate that they kinda try to hide the model version. Like if you click the dropdown in the chat box, you can see that "Thinking" means 3 Pro. When you select the "Create images" tool, it doesn't tell you it's using Nano Banana Pro until it actually starts generating the image.
Tell me the model it's using. It's as if Google is trying to unburden me with the knowledge of what model does what but it's just making things more confusing.
Oh, and setting up AI Studio is a mess. First I have to create a project. Then an API key. Then I have to link the API key to the project. Then I have to link the project to the chat session... Come on, Google.
Alright results are in! I've re-run all my editing based adherence related prompts through Nano Banana Pro. NB Pro managed to successfully pass SHRDLU, the M&M Van Halen test (as verified independently by Simon), and the Scorpio street test - all of which the original NB failed.
Model results
1. Nano Banana Pro: 10 / 12
2. Seedream4: 9 / 12
3. Nano Banana: 7 / 12
4. Qwen Image Edit: 6 / 12
https://genai-showdown.specr.net/image-editingIf you just want to see how NB and NB Pro compare against each other:
https://genai-showdown.specr.net/image-editing?models=nb,nbp
I think Nano Banana Pro should have passed your giraffe test. It's not a great result but it is exactly what you asked for. It's no worse than Seedream's result imo.
Yeah I think that's a fair critique. It kind of looks like a bad cut-and-replace job (if you zoom in you can even see part of the neck is missing). I might give it some more attempts to see if it can do a better job.
I agree that Seedream could definitely be called out as a fail since it might just be a trick of perspective.
Have you ever considered a “partial pass”?
Perhaps it would be an easy cop out of making a decision if you had to choose something outside of pass/fail.
I agree. From where I'm sitting, Seedream just bent the neck while Nano Banana Pro actually shortened the neck.
The pisa tower test is really interesting. Many of this prompt have stricter criteria with implicit knowledge and some models impressively pass it. Yet for something as obvious as straightening a slanted object is hard even for latest models.
I suspect there'd be no problem rotating a different object. But this tower is EXTREMELY represented in the training data. It's almost an immutable law of physics that Towers in Pisa are Leaning.
It's also a tower that has famously been deliberately un-straightend just enough to remain a tourist attraction while remaining stable.
Would you leave one of the originals in each test visible at all times (a control) so that I can see the final image(s) that I'm considering and the original image at the same time?
I guess if you do that then maybe you don't need the cool sliders anymore?
Anyway - thanks so much for all your hard work on this. A very interesting study!
thanks, I love your website. Are you planning to do NB Pro for the text-to-image benchmark too?
Definitely! Even though NB's predominant use case seems to be editing, it's still producing surprisingly decent text-to-image results. Imagen4 currently still comes out ahead in terms of image fidelity, but I think NB Pro will close the gap even further.
I'll try to have the generative comparisons for NB Pro up later this afternoon once I catch my breath.
I...worked on the detailed Nano Banana prompt engineering analysis for months (https://news.ycombinator.com/item?id=45917875)...and...Google just...Google released a new version.
Nano Banana Pro should work with my gemimg package (https://github.com/minimaxir/gemimg) without pushing a new version by passing:
g = GemImg(model="gemini-3-pro-image-preview")
I'll add the new output resolutions and other features ASAP. However, looking at the pricing (https://ai.google.dev/gemini-api/docs/pricing#standard_1), I'm definitely not changing the default model to Pro as $0.13 per 1k/2k output will make it a tougher sell.EDIT: Something interesting in the docs: https://ai.google.dev/gemini-api/docs/image-generation#think...
> The model generates up to two interim images to test composition and logic. The last image within Thinking is also the final rendered image.
Maybe that's partially why the cost is higher: it's hard to tell if intermediate images are billed in addition to the output. However, this could cause an issue with the base gemimg and have it return an intermediate image instead of the final image depending on how the output is constructed, so will need to double-check.
>> - Put a strawberry in the left eye socket. >>- Put a blackberry in the right eye socket.
>> All five of the edits are implemented correctly
This is a GREAT example of the (not so) subtle mistakes AI will make in image generation, or code creation, or your future knee surgery. The model placed the specified items in the eye sockets based on the viewers left/right; when we talk relative in this scenario we usually (always?) mean from the perspective of the target or "owner". Doctors make this mistake too (they typically mark the correct side with a sharpie while the patient is still alert) but I'd be more concerned if we're "outsourcing" decision making without adequate oversight.
https://minimaxir.com/2025/11/nano-banana-prompts/#hello-nan...
That was a big problem when I was toying around the original Nano Banana. I always prompted the perspective of the (imaginary) camera, and yet NB often interpreted that as that of the target, giving no way to select the opposite side. Since the selected side is generally closer to the camera, my usual workaround is to force the side far from the camera. And yet that was not perfect.
There's a classic well-illustrated book, _How to Keep Your Volkswagen Alive_, which spends a whole illustrated page at the beginning building up a reference frame for working on the vehicle. Up is sky, down is ground, front is always vehicle's front, left is always vehicle's left.
Sounds a bit silly to write it out, but the diagram did a great job removing ambiguity when you expect someone to be laying on the ground in a tight place looking backwards, upside down.
Also feels important to note that in the theatre, there is stage-right and stage-left, jargon to disambiguate even though the jargon expects you to know the meaning to understand it.
>This is a GREAT example of the (not so) subtle mistakes AI will make in image generation, or code creation, or your future knee surgery.
The mistake is in the prompting (not enough information). The AI did the best it could
"What's the biggest known planet" "Jupiter" "NO I MEANT IN THE UNIVERSE!"
It doesn't affect your point but technically since the IAU are insane, exoplanets aren't technically planets and Jupiter is the largest planet in the universe.
I suppose it was too much to hope that chatbots could be trained to avoid pointless pedantry.
They've been trained on every web forum on the Internet. How could it be possible for them to avoid that?
No, this is squarely on the AI. A human would know what you mean without specific instructions.
Seems like you're making a judgment based on your own experience, but as another commenter pointed out, it was wrong. There are plenty of us out there who would confirm, because people are too flawed to trust. Humans double/triple check, especially under higher stakes conditions (surgery).
Heck, humans are so flawed, they'll put the things in the wrong eye socket even knowing full well exactly where they should go - something a computer literally couldn't do.
Intelligence in my book includes error correction. Questioning possible mistakes is part of wisdom.
So the understanding that AI and HI are different entities altogether with only a subset of communication protocols between them will become more and more obvious, like some comments here are already implicitly telling.
Why on earth would the fallback when a prompt is under specified be to do something no human expects?
If the instructions were actually specific, e.g. Put a blackberry in its right eye socket, then yes, most humans would know what that meant. But the instructions were not that specific: in the right eye socket
If you asked me right now what the biggest known planet was, I'd think Jupiter. I'd assume you were talking about our solar system ("known" here implying there might be more planets out in the distant reaches).
But different humans would know what you meant differently. Some would have known it the same way the AI did.
Right, that's why one should use "put a strawberry in the portside eye socket" and "put a strawberry in the starboard side socket"
I don't know if that's so much a mistake as it is ambiguity though? To me, using the viewer's perspective in this case seems totally reasonable.
Does it still use the viewer's perspective if the prompt specifies "Put a strawberry in the _patient's left eye_"? If it does, then you're onto something. Otherwise I completely disagree with this.
“The right socket” can only be implied one way when talking about a body just like you only have one right hand despite the fact that it is on my left when looking at you.
I think the fact that anyone in this thread thinks it's ambiguous is proof by definition that it's ambiguous.
"Plug into right power socket"
Same language, opposite meaning because of a particular noun + context.
I think the only thing obvious here is that there is no obvious solution other than adding lots of clarification to your prompt.
I think you missed the entire point?
No, they just disagree with you.