Claude Sonnet 5 rolls out as default agentic model

0 5 minutes read

Claude Sonnet 5 rolls out as default agentic model

Claude Sonnet 5 is now available across all plans, built to be more agentic than earlier Sonnet models—planning, using tools, and handling multi-step work at prices lower than Opus-class options. The company says safety scores improved versus Sonnet 4.6, while

On the days when builders and engineers can’t afford to babysit software. the appeal is simple: fewer interruptions. more finish-through. Claude Sonnet 5 arrives with a promise aimed right at that moment—autonomous. tool-using “agentic” work that the company says closes the gap with its Opus models. but at lower prices.

The rollout is also practical for everyday users. Claude Sonnet 5 is available everywhere starting today. It becomes the default model for Free and Pro plans, and it’s available to Max, Team, and Enterprise users as well. Developers can also access it via the Claude API.

For the early stage of a deployment—where cost and predictability tend to decide what ships first—pricing is part of the pitch. On the Claude Platform and in Claude Code. Sonnet 5 launches with introductory pricing of $2 per million input tokens and $10 per million output tokens through August 31. 2026. After that date, the standard price will move to $3 per million input tokens and $15 per million output tokens.

The company positions Sonnet 5 as the most “agentic” Sonnet model yet. It can make plans. use tools like browsers and terminals. and run autonomously at a level that. just a few months ago. required larger and more expensive models. For developers who have been moving up the ladder of agentic capability—from Claude Sonnet 3.5. 3.6. and 3.7 to Opus-class gains—the message is that the latest Sonnet iteration narrows a performance gap.

Claude Sonnet 5 is described as being close in performance to Opus 4.8, but at lower prices. It is also framed as a substantial improvement over its predecessor, Sonnet 4.6, across agentic performance areas such as reasoning, tool use, coding, and knowledge work.

Those claims are backed by the company’s evaluation set. The Claude Sonnet 5 System Card reports a broader set of evaluations in detail. including scores from Sonnet 5. Sonnet 4.6. and Opus 4.8 on multiple benchmarks—such as BrowseComp for agentic search evaluation and OSWorld-Verified for computer use evaluation. In the cited comparisons. Sonnet 5 is shown as a strict improvement over Sonnet 4.6 (with Opus 4.8 still described as the model of choice for higher accuracy). while effort levels can be adjusted between Sonnet 5 and Opus 4.8 to find a cost-performance balance.

Safety is part of the release story too. The company says pre-deployment safety evaluations found that Sonnet 5 has an overall lower rate of undesirable behaviors than Sonnet 4.6 and is generally safer to use in agentic contexts. It also points to evaluations showing a much lower ability to perform cybersecurity tasks than current Opus models.

That last point becomes important when it turns from capability to guardrails. The company says Sonnet 5 was not deliberately trained on cybersecurity tasks. It can perform some routine. non-harmful cyber tasks. but on evaluations testing potentially dangerous cyber skills—like developing software exploits—it shows substantially poorer performance than models such as Opus 4.8 and Claude Mythos 5. In one evaluation developed in collaboration with Mozilla that tested models’ ability to develop exploits for vulnerabilities in Firefox (vulnerabilities patched in Firefox 148). the report says Sonnet 5 was never able to develop a full working exploit (0.0%). though it showed a slightly higher partial success rate than Sonnet 4.6. Both Sonnet models are described as having substantially poorer cyber capabilities than Opus 4.8 and Mythos 5.

Because Sonnet 5 is “somewhat stronger” than its predecessor on those tasks. the company says it launched with cybersecurity safeguards enabled by default. The safeguards detect and block dangerous cyber usage in real time and are described as the same as those present in Claude Opus 4.7 and 4.8. The company also says those safeguards are less strict than those launched with Fable 5. which it describes as blocking a wider range of cybersecurity tasks.

Even within the broader agentic pitch. the release is framed around what builders can practically get done—work that continues after the initial answer. The company’s “early access” feedback is described as consistent: testers said Sonnet 5 finishes complex tasks where previous Sonnet models would stop short. They also described it checking its own output without being explicitly asked. and doing all that agentic work at what they call an attractive price point.

In examples tied to software and automation. the company describes Sonnet 5 handling sustained coding. tool use. and debugging across “messy technical contexts.” It’s described as being especially useful for workflows where follow-through and technical grounding matter. The company also reports handing Claude Sonnet 5 a two-part job—updating Salesforce account tiers and sending a launch announcement to enterprise contacts—and having it finish end to end. something described as previously stalling halfway. For day-to-day automation, it’s framed as a “no-brainer” because it delivers the same output quality with fewer steps.

Other internal tests are described as similarly end-to-end. The company says it ran Sonnet 5 against dozens of challenging real pull requests and that it carried each one through to a tested. verified result on its own—freeing engineers for judgment. decision-making. and final sign-off. In another described scenario. the company says it asked Sonnet 5 to investigate a bug; unprompted. it wrote a reproducing test. implemented a fix. then stashed it to confirm the bug came back without the change—all in a single pass.

The release leans hard on the “execution layer” idea: agents staying on plan. following conventions. and shipping clean multi-step changes efficiently. The company describes Sonnet 5 as being at its best on “brownfield code. ” including race conditions and hidden tests. tracing failures to root causes and shipping durable fixes rather than patching symptoms.

There’s also a pricing-to-output argument embedded in several vertical examples. including legal research and analysis work. computer-use workflows. and live-data tasks. The company says ClickHouse agents explore live data and produce insights on the fly. where time-to-insight matters when testing new models. It says Sonnet 5 reasons in tighter steps to get users answers noticeably faster. At Pace. it says computer-use agents run insurance workflows—submission intake. FNOL. and loss runs—on systems already used by operations teams. and that Sonnet 5 consistently takes the right action quickly.

Behind all of it sits the same tension the release tries to manage: autonomy that is useful for builders. paired with safety boundaries that still have to hold. The company’s numbers say Sonnet 5 is safer overall than Sonnet 4.6 in pre-deployment evaluations. and it describes improved refusal of malicious requests and resistance to hijack attempts in prompt injection attacks. The report also says Sonnet 5 has lower rates of hallucination and sycophancy than Sonnet 4.6.

But the same safety section includes a comparison that prevents the story from sounding too clean. On an automated behavioral audit that tests a wide range of misaligned behaviors—including cooperation with misuse and deception—the company says Sonnet 5 scored lower overall than Sonnet 4.6. meaning safer. Still. it also says Sonnet 5 showed somewhat higher rates of misaligned behavior on this assessment compared to Opus 4.8 and Claude Mythos Preview.

As Sonnet 5 becomes the default on Free and Pro plans. the rollout becomes less about lab benchmarks and more about whether the model’s described “plan and finish” behavior shows up reliably for the kinds of tasks users try first—debugging. tool use. and multi-step work that previously demanded more attention from people.

The company says it also increased rate limits across Chat, Cowork, Claude Code, and the Claude Platform to accommodate higher token usage at higher effort levels, letting users choose whichever level fits their projects.

Claude Sonnet 5 agentic AI Claude API Claude Code Claude Platform Opus 4.8 Sonnet 4.6 pricing system card safety evaluations cybersecurity safeguards browser and terminal tools

Michael King 1 hour ago

0 5 minutes read

Leave a Reply Cancel reply