Gemini 3.1 Pro Preview

console.cloud.google.com

158 points by MallocVoidstar 4 hours ago


sigmar - 3 hours ago

blog post is up- https://blog.google/innovation-and-ai/models-and-research/ge...

edit: biggest benchmark changes from 3 pro:

arc-agi-2 score went from 31.1% -> 77.1%

apex-agents score went from 18.4% -> 33.5%

esafak - 3 hours ago

Has anyone noticed that models are dropping ever faster, with pressure on companies to make incremental releases to claim the pole position, yet making strides on benchmarks? This is what recursive self-improvement with human support looks like.

maxloh - 3 hours ago

Gemini 3 seems to have a much smaller token output limit than 2.5. I used to use Gemini to restructure essays into an LLM-style format to improve readability, but the Gemini 3 release was a huge step back for that particular use case.

Even when the model is explicitly instructed to pause due to insufficient tokens rather than generating an incomplete response, it still truncates the source text too aggressively, losing vital context and meaning in the restructuring process.

I hope the 3.1 release includes a much larger output limit.

zhyder - 3 hours ago

Surprisingly big jump in ARC-AGI-2 from 31% to 77%, guess there's some RLHF focused on the benchmark given it was previously far behind the competition and is now ahead.

Apart from that, the usual predictable gains in coding. Still is a great sweet-spot for performance, speed and cost. Need to hack Claude Code to use their agentic logic+prompts but use Gemini models.

I wish Google also updated Flash-lite to 3.0+, would like to use that for the Explore subagent (which Claude Code uses Haiku for). These subagents seem to be Claude Code's strength over Gemini CLI, which still has them only in experimental mode and doesn't have read-only ones like Explore.

qingcharles - 3 hours ago

I've been playing with the 3.1 Deep Think version of this for the last couple of weeks and it was a big step up for coding over 3.0 (which I already found very good).

It's only February...

WarmWash - 3 hours ago

It seems google is having a disjointed roll out, and there will likely be an official announcement in a few hours. Apparently 3.1 showed up unannounced in vertex at 2am or something equally odd.

Either way early user tests look promising.

vinhnx - 3 hours ago

Model card https://deepmind.google/models/model-cards/gemini-3-1-pro/

clhodapp - 3 hours ago

There's a very short blog post up: https://blog.google/innovation-and-ai/models-and-research/ge...

__jl__ - 3 hours ago

Another preview release. Does that mean the recommended model by Google for production is 2.5 Flash and Pro? Not talking about what people are actually doing but the google recommendation. Kind of crazy if that is the case

mark_l_watson - 3 hours ago

Fine, I guess. The only commercial API I use to any great extent is gemini-3-flash-preview: cheap, fast, great for tool use and with agentic libraries. The 3.1-pro-preview is great, I suppose, for people who need it.

Off topic, but I like to run small models on my own hardware, and some small models are now very good for tool use and with agentic libraries - it just takes a little more work to get good results.

ChrisArchitect - an hour ago

More discussion: https://news.ycombinator.com/item?id=47075318

denysvitali - 3 hours ago

Where is Simon's pelican?

msavara - 3 hours ago

Somehow doesn't work for me :) "An internal error has occurred"

makeavish - 3 hours ago

I hope to have great next two weeks before it gets nerfed.

Topfi - 3 hours ago

Appears the only difference to 3.0 Pro Preview is Medium reasoning. Model naming has long gone from even trying to make sense, but considering 3.0 is still in preview itself, increasing the number for such a minor change is not a move in the right direction.

- 2 hours ago
[deleted]
cmrdporcupine - 3 hours ago

Doesn't show as available in gemini CLI for me. I have one of those "AI Pro" packages, but don't see it. Typical for Google, completely unclear how to actually use their stuff.

saberience - 3 hours ago

I always try Gemini models when they get updated with their flashy new benchmark scores, but always end up using Claude and Codex again...

I get the impression that Google is focusing on benchmarks but without assessing whether the models are actually improving in practical use-cases.

I.e. they are benchmaxing

Gemini is "in theory" smart, but in practice is much, much worse than Claude and Codex.

rohithavale3108 - 4 hours ago

[flagged]

techgnosis - 3 hours ago

I'd love a new Gemini agent that isn't written with Node.js. Not sure why they think that's a good distribution model.