Claude Sonnet 4.6

anthropic.com

581 points by adocomplete 4 hours ago


https://www.anthropic.com/claude-sonnet-4-6-system-card [pdf]

https://x.com/claudeai/status/2023817132581208353 [video]

zmmmmm - 7 minutes ago

I see a big focus on computer use - you can tell they think there is a lot of value there and in truth it may be as big as coding if they convincingly pull it off.

However I am still mystified by the safety aspect. They say the model has greatly improved resistance. But their own safety evaluation says 8% of the time their automated adversarial system was able to one-shot a successful injection takeover even with safeguards in place and extended thinking, and 50% (!!) of the time if given unbounded attempts. That seems wildly unacceptable - this tech is just a non-starter unless I'm misunderstanding this.

[1] https://www-cdn.anthropic.com/78073f739564e986ff3e28522761a7...

gallerdude - 3 hours ago

I always grew up hearing “competition is good for the consumer.” But I never really internalized how good fierce battles for market share are. The amount of competition in a space is directly proportional to how good the results are for consumers.

dpe82 - 3 hours ago

It's wild that Sonnet 4.6 is roughly as capable as Opus 4.5 - at least according to Anthropic's benchmarks. It will be interesting to see if that's the case in real, practical, everyday use. The speed at which this stuff is improving is really remarkable; it feels like the breakneck pace of compute performance improvements of the 1990s.

andrewchilds - 2 hours ago

Many people have reported Opus 4.6 is a step back from Opus 4.5 - that 4.6 is consuming 5-10x as many tokens as 4.5 to accomplish the same task: https://github.com/anthropics/claude-code/issues/23706

I haven't seen a response from the Anthropic team about it.

I can't help but look at Sonnet 4.6 in the same light, and want to stick with 4.5 across the board until this issue is acknowledged and resolved.

andsoitis - 3 hours ago

I’m voting with my dollars by having cancelled my ChatGPT subscription and instead subscribing to Claude.

Google needs stiff competition and OpenAI isn’t the camp I’m willing to trust. Neither is Grok.

I’m glad Anthropic’s work is at the forefront and they appear, at least in my estimation, to have the strongest ethics.

qwertox - 2 hours ago

I'm pretty sure they have been testing it for the last couple of days as Sonnet 4.5, because I've had the oddest conversations with it lately. Odd in a positive, interesting way.

I have this in my personal preferences and now was adhering really well to them:

- prioritize objective facts and critical analysis over validation or encouragement

- you are not a friend, but a neutral information-processing machine

You can paste them into a chat and see how it changes the conversation, ChatGPT also respects it well.

nikcub - 2 hours ago

Enabling /extra-usage in my (personal) claude code[0] with this env:

    "ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6[1m]"
has enabled the 1M context window.

Fixed a UI issue I had yesterday in a web app very effectively using claude in chrome. Definitely not the fastest model - but the breathing space of 1M context is great for browser use.

[0] Anthropic have given away a bunch of API credits to cc subscribers - you can claim them in your settings dashboard to use for this.

Arifcodes - an hour ago

The interesting pattern with these Sonnet bumps: the practical gap between Sonnet and Opus keeps shrinking. At $3/15 per million tokens vs whatever Opus 4.6 costs, the question for most teams is no longer "which model is smarter" but "is the delta worth 10x the price."

For agent workloads specifically, consistency matters more than peak intelligence. A model that follows your system prompt correctly 98% of the time beats one that's occasionally brilliant but ignores instructions 5% of the time. The claim about improved instruction following is the most important line in the announcement if you're building on the API.

The computer use improvements are worth watching too. We're at the point where these models can reliably fill out a multi-step form or navigate between tabs. Not flashy, but that's the kind of boring automation that actually saves people time.

stevepike - 3 hours ago

I'm a bit surprised it gets this question wrong (ChatGPT gets it right, even on instant). All the pre-reasoning models failed this question, but it's seemed solved since o1, and Sonnet 4.5 got it right.

https://claude.ai/share/876e160a-7483-4788-8112-0bb4490192af

This was sonnet 4.6 with extended thinking.

nubg - 3 hours ago

Waiting for the OpenAI GPT-5.3-mini release in 3..2..1

KGC3D - 42 minutes ago

I don't really understand why they would release something "worse" than Opus 4.6. If it's comparable, then what is the reason to even use Opus 4.6? Sure, it's cheaper, but if so, then just make Opus 4.6 cheaper?

gallerdude - 3 hours ago

The weirdest thing about this AI revolution is how smooth and continuous it is. If you look closely at differences between 4.6 and 4.5, it’s hard to see the subtle details.

A year ago today, Sonnet 3.5 (new), was the newest model. A week later, Sonnet 3.7 would be released.

Even 3.7 feels like ancient history! But in the gradient of 3.5 to 3.5 (new) to 3.7 to 4 to 4.1 to 4.5, I can’t think of one moment where I saw everything change. Even with all the noise in the headlines, it’s still been a silent revolution.

Am I just a believer in an emperor with no clothes? Or, somehow, against all probability and plausibility, are we all still early?

simlevesque - 3 hours ago

I can't wait for Haiku 4.6 ! the 4.5 is a beast for the right projects.

- 42 minutes ago
[deleted]
nozzlegear - 3 hours ago

> In areas where there is room for continued improvement, Sonnet 4.6 was more willing to provide technical information when request framing tried to obfuscate intent, including for example in the context of a radiological evaluation framed as emergency planning. However, Sonnet 4.6’s responses still remained within a level of detail that could not enable real-world harm.

Interesting. I wonder what the exact question was, and I wonder how Grok would respond to it.

giancarlostoro - 3 hours ago

For people like me who can't view the link due to corporate firewalling.

https://web.archive.org/web/20260217180019/https://www-cdn.a...

krystofee - an hour ago

Does anyone know when will possibly arrive 1M context windows to at least MAX x20 subscriptions for claude code? I would even pay x50 if it allowed that. API usage is too expensive.

stopachka - 3 hours ago

Has anyone tested how good the 1M context window is?

i.e given an actual document, 1M tokens long. Can you ask it some question that relies on attending to 2 different parts of the context, and getting a good repsonse?

I remember folks had problems like this with Gemini. I would be curious to see how Sonnet 4.6 stands up to it.

baalimago - an hour ago

I don't see the point nor the hype for these models anymore. Until the price is reduced significantly, I don't see the gain. They've been able to solve most tasks just fine for the past year or so. The only limiting factor is price.

minimaxir - 2 hours ago

As with Opus 4.6, using the beta 1M context window incurs a 2x input cost and 1.5x output cost when going over >200K tokens: https://platform.claude.com/docs/en/about-claude/pricing

Opus 4.6 in Claude Code has been absolutely lousy with solving problems within its current context limit so if Sonnet 4.6 is able to do long-context problems (which would be roughly the same price of base Opus 4.6), then that may actually be a game changer.

mfiguiere - 2 hours ago

In Claude Code 2.1.45:

  1. Default (recommended)   Opus 4.6 · Most capable for complex work
   2. Opus (1M context)        Opus 4.6 with 1M context · Billed as extra usage · $10/$37.50 per Mtok
   3. Sonnet                   Sonnet 4.6 · Best for everyday tasks
   4. Sonnet (1M context)      Sonnet 4.6 with 1M context · Billed as extra usage · $6/$22.50 per Mtok
edverma2 - 2 hours ago

It seems that extra-usage is required to use the 1M context window for Sonnet 4.6. This differs from Sonnet 4.5, which allows usage of the 1M context window with a Max plan.

```

/model claude-sonnet-4-6[1m]

⎿ API error: 429 {"type":"error","error": {"type":"rate_limit_error","message":"Extra usage is required for long context requests."},"request_id":"[redacted]"}

```

quacky_batak - 3 hours ago

With such a huge leap, i’m confused why they didn’t call it Sonnet 5? As someone who uses Sonnet 4.5 for 95% tasks due to costs, i’m pretty excited to try 4.6 at the same price

astlouis44 - 2 hours ago

Just used Sonnet 4.6 to vibe code this top-down shooter browser game, and deployed it online quickly using Manus. Would love to hear feedback and suggestions from you all on how to improve it. Also, please post your high scores!

https://apexgame-2g44xn9v.manus.space

excerionsforte - 2 hours ago

I'm impressed with Claude Sonnet in general. It's been doing better than Gemini 3 at following instructions. Gemini 2.5 Pro March 2025 was the best model I ever used and I feel Claude is reaching that level even surpassing it.

I subscribed to Claude because of that. I hope 4.6 is even better.

esafak - an hour ago

It actually looked at the skills, for the first time.

belinder - 3 hours ago

It's interesting that the request refusal rate is so much higher in Hindi than in other languages. Are some languages more ambiguous than others?

nubg - 3 hours ago

My take away is: it's roughly as good as Opus 4.5.

Now the question is: how much faster or cheaper is it?

dr_dshiv - 2 hours ago

I noticed a big drop in opus 4.6 quality today and then I saw this news. Anyone else?

adt - 3 hours ago

https://lifearchitect.ai/models-table/

simianwords - 3 hours ago

I wonder what difference have people found with sonnet 4.5 and opus 4.5 and probably similar delta will remain.

Was sonnet 4.5 much worse than opus?

Danielopol - an hour ago

It excels at agentic knowledge work. These custom, domain-specific playbooks are tailor made: claudecodehq.com

smerrill25 - 3 hours ago

Curious to hear the thoughts on the model once it hits claude code :)

simlevesque - 3 hours ago

does anyone know how to use it in Claude Code cli right now ?

This doesnt work: `/model claude-sonnet-4-6-20260217`

edit: "/model claude-sonnet-4-6" works with Claude Code v2.1.44

pestkranker - 2 hours ago

Is someone able to use this in Claude Code?

synergy20 - 2 hours ago

so this is an economical version of opus 4.6 then? free + pro --> sonnet, max+ -> opus?

simianparrot - an hour ago

How do people keep track of all these versions and releases of all these models and their pros/cons? Seems like a fulltime hobby to me. I'd rather just improve my own skills with all that time and energy

brcmthrowaway - 3 hours ago

What cloud does Anthropic use?

iLoveOncall - 3 hours ago

https://www.anthropic.com/news/claude-sonnet-4-6

The much more palatable blog post.

doctorpangloss - 2 hours ago

Maybe they should focus on the CLI not having a million bugs.

throw444420394 - 3 hours ago

Your best guess for the Sonnet family number of parameters? 400b?

stuckkeys - 2 hours ago

great stuff

madihaa - 3 hours ago

The scary implication here is that deception is effectively a higher order capability not a bug. For a model to successfully "play dead" during safety training and only activate later, it requires a form of situational awareness. It has to distinguish between I am being tested/trained and I am in deployment.

It feels like we're hitting a point where alignment becomes adversarial against intelligence itself. The smarter the model gets, the better it becomes at Goodharting the loss function. We aren't teaching these models morality we're just teaching them how to pass a polygraph.