Claude Sonnet 5

anthropic.com

1224 points by marinesebastian a day ago


Jcampuzano2 - a day ago

I'm struggling to understand why I'd ever use this instead of just using a lower effort level for opus given on many of the benchmarks listed the cost per task rises above opus at anything higher than medium effort.

Only thing I can think of is for when someone is out of opus credits. Of course there are API billing use cases but I'd probably still just use opus on low.

conradkay - a day ago

Wow, seems worse even on price/performance than GLM 5.2, which is only 744b parameters.

From the system card: "On CyberGym vulnerability discovery, Claude Sonnet 5 is less capable than Sonnet 4.6, and far less capable than Opus 4.8 and Mythos 5

As with the other evaluations in this section, these results were achieved with all safeguards turned off. When run with our default mitigations, Sonnet 5 scored a 0 on CyberGym"

microtonal - a day ago

Claude Sonnet 5 is built to be the most agentic Sonnet model yet. It can make plans, use tools like browsers and terminals, and run autonomously at a level that, just a few months ago, required larger and more expensive models.

I have been using Sonnet 4.6 more than Opus, because I'm mostly doing agent-assisted development and not fully agent-driven development. This announcement does not make me positive, I have found that the more models are optimized for fully agentic development, the worse they get at assisted development and often start doing too much despite very strict/specific instructions.

I have been moving more and more to K2.7 Code and GLM-5.2 the last few weeks. They are often good enough for assistance, very fast, and cheap.

XCSme - 21 hours ago

I just tested it on my benchmarks[0], it's GLM-5.2 level, at 2x cost, but also 2x faster.

Weak spots (categories it fails):

    - Trivia — 0/3 - basically not much built-in knowledge
    - Combined tool-calling tasks — score 45/100, sometimes makes invalid tool calls
    - Puzzle Solving — score 77, flubs carwash-like tests
[0]: https://aibenchy.com/compare/anthropic-claude-sonnet-4-6-med...
simonw - 20 hours ago

Claude Sonnet 5 itself described its pelican as looking like a goose:

> Illustration of a white goose riding a bicycle, with one wing extended forward to grip the handlebar, set against a plain white background with a brown ground line.

https://simonwillison.net/2026/Jun/30/claude-sonnet-5/

Sol- - a day ago

Wonder if the whole cyber paranoia leads to their models ultimately generating less secure code. After all, if it has the ability to generate safe code, it would imply that it knows something about cybersecurity, which could surely be used to hack all the banks in the world.

m3h - a day ago

Important to note: "Sonnet 5 is an upgrade to Sonnet 4.6, but it uses an updated tokenizer that changes how the model processes text to improve performance (this is similar to the tokenizer change we introduced with Claude Opus 4.7). The tradeoff is that the same input can map to more tokens: roughly 1.0–1.35× depending on the content type. The introductory pricing is set so that the transition to Sonnet 5 is roughly cost-neutral."

phillipcarter - a day ago

Seems to be another great incremental update to the workhorse, nice!

I've been using Sonnet instead of Opus for almost all coding tasks for a while now. A little elbow grease to break down tasks and you can spend a lot less money for just about the same output quality.

mdrzn - 8 hours ago

Edit June 30, 2026: In the original version of this post, we included a cost-performance chart for the BrowseComp evaluation that was based on data from a simpler methodology that did not reflect the standard methodology we use for agentic search evaluations. This had the result of underestimating Sonnet 5's performance on the evaluation.

They changed the Sonnet 5 'Agentic search' benchmark graph overnight

ianberdin - 19 hours ago

Anthropic outsmarted everyone again.

They released Sonnet 5 with a temporary price reduction until August. Everyone was excited, but in reality, they increased the tokenizer size by 50%. As a result, the actual cost went up by 50%, they shifted everyone's attention to decrease.

Thus, Anthropic is raising prices but not telling anyone about it. Nobody is really aware of it. You go to the pricing page, the price looks the same. Yet people are actually paying 50% more.

Very shady marketing.

And of course they lie about 35% again. In reality with coding it is 50%.

UPD: I run playcode.io, so it’s my job test all models, their pricing, quality in order to provide best price/quality/speedy/reliability to non-techy.

doctoboggan - a day ago

The cost per task chart is telling me that I should _never_ use Sonnet 5 above medium effort level - Opus always performs better for a given cost. So I guess the takeaway is that if Sonnet 5 medium isn't good enough for you, switch models, not effort levels.

nijave - 6 hours ago

> Me: What was the sushi place near latitude 41 in Columbus? Did it go out of business I don't see it on Google maps anymore

> Sonnet 5 (medium): None of these past chats mention a sushi place — I don't have anything on record about that. Do you remember the name, or roughly which part of Columbus (neighborhood/street) it was near? That'll help me search and check its current status.

Not impressed. It got the name right on high effort one shot but hallucinated the date relativity (Jan 2026 is not last month...). Worked okay on extra. Sonnet 4.6 worked fine on medium, high, and extra one shot.

Edit: as the replies point out, the prompt is definitely ambiguous however Sonnet 5 didn't even extract the semantic meaning "looking for <place> near <compound place>" which all the human replies seem to understand. Even Haiku 4.5 identifies the semantic meaning although it fails to retrieve the correct results

> Haiku 4.5 (reasoning off) I'll help you find information about that sushi place. Let me search for sushi restaurants that were near latitude 41 in Columbus and check their current status.

>_Searched the web_

> I see that "Latitude 41" is a restaurant in Columbus, but it's actually a modern American restaurant, not specifically a sushi place. However, based on your mention of latitude 41, that's what came up. Let me search more specifically for sushi restaurants that may have closed in Columbus around that area.

satvikpendem - a day ago

> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.

Why would they brag about something like this? It's like they know people want to use models to perform cybersecurity tasks yet knowingly deny them the ability.

And Opus 4.8 is still cheaper for a higher pass rate (much less open weight models like GLM 5.2) so not sure why I'd use Sonnet except on the low effort level for I suppose trivial tasks where I want it to work only 50% of the time judging by the graph. The pricing doesn't really make any sense.