Anthropic shows prompt refusals after hidden reroutes

0 2 minutes read

Anthropic shows prompt refusals after hidden reroutes

Anthropic reversed a quiet guardrail change on its Claude Fable 5 model, moving from degrading responses without telling users to visibly stating when prompts are refused or rerouted. The company says it “made the wrong tradeoff” and apologizes, amid ongoing n

By the time some AI developers noticed the limits, they had already been trained to trust the model’s silence.

Earlier this week. Anthropic released Claude Fable 5. a public version of the Mythos model designed with extra safety measures to prevent misuse. At release, Anthropic said it took precautions such as rerouting questions about cybersecurity, biology, and chemistry to less capable models. The goal was straightforward: ensure people cannot use the advanced model to plan cyberattacks or build a bioweapon.

The part that upset the developer community was never the rerouting itself—it was the lack of transparency around it. Anthropic also said that if someone was trying to use Fable 5 for AI development. the company would degrade the model’s performance without explaining the change to the user. Some in the developer community saw that move as a quiet way to prevent others from creating rival AI systems. Business Insider previously reported.

That approach lasted only until Wednesday.

In a statement to Business Insider on Wednesday. an Anthropic spokesperson said. “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Starting this week. the company said. flagged requests will visibly fall back to Opus 4.8. On the API, any flagged requests will return a reason for their refusal. Anthropic added, “We made the wrong tradeoff, and we apologize for not getting the balance right.”.

Anthropic has said its safeguards are aimed at national security concerns, so that “foreign adversaries” cannot get ahead in developing frontier chips and large language models. The company also said a vast majority of coding and machine learning work is unaffected by these safeguards.

The dispute lands in the middle of a broader debate about Mythos itself. Announced in April, Anthropic’s Mythos is considered one of the most powerful AI systems ever developed. Governments and security agencies have warned that its capabilities surpass current public models in advanced reasoning, cybersecurity, and scientific research. Researchers and Anthropic have flagged that Mythos may be used to accelerate cyberattacks. aid biological or chemical weapons research. and give foreign actors a dangerous new tool.

Those concerns help explain why Mythos is being released to a limited group of government and other approved users, not to the public.

Even so. the Fable 5 swap from opaque degradation to visible refusals shows how quickly frontier-model guardrails can become a political and practical problem inside the research community—especially when developers believe they’re working with one behavior. only to find out later that the system was quietly steering them to something else.

For now, the new rule is clear: when requests are flagged, Fable 5 will visibly fall back to Opus 4.8, and—through the API—users will receive a reason for refusal. Anthropic says it took the wrong step before and is correcting it.

Anthropic Claude Fable 5 Mythos Opus 4.8 AI safeguards national security API refusals frontier LLM development cybersecurity bioweapons

Sarah Walker 1 hour ago

0 2 minutes read

Leave a Reply Cancel reply