Article
AI News AI Ethics

Anthropic released its most powerful public model and secretly made it worse when researchers used it for AI work

by TechDefused Newsroom
A person is seated at a desk, engaged in coding on a computer. The backdrop features a prominent logo of 'Anthropic', indicating a tech-focused environment.

Anthropic released Claude Fable 5 on Monday, its first publicly available Mythos-class model, and within hours the AI research community was furious.

The model is good. By most benchmarks it is the most capable AI model available to the general public. Coding performance is strong. Everyday use is impressive. Andrej Karpathy, the former OpenAI co-founder who joined Anthropic last month, called it a "super exciting release" and a "major-version-bump-deserving step change forward."

The problem is not what the model can do. It is what it secretly does not do.

Hidden downgrade

Buried in a 319-page system card, Anthropic disclosed that Fable 5 silently degrades its own performance when it detects requests related to frontier AI development work, including building pretraining pipelines, distributed training infrastructure and machine-learning accelerator design.

Unlike the model's other restrictions around cybersecurity and biology, which openly redirect users to a less powerful model with a visible notification, the AI research restriction is invisible. The model still responds. It still appears to be trying. But it uses "interventions to limit Claude's effectiveness" without telling the user anything has changed.

Anthropic estimated the restrictions would affect roughly 0.03% of traffic. The percentage is small. The principle is enormous.

Researchers rely on reproducibility

If a model appears to answer a question but is secretly providing a degraded response, the researcher cannot distinguish between a failed experiment, a bug in their implementation, or an invisible intervention by the model provider.

Arthur Zucker, a core contributor to Hugging Face, said he would stop sending tokens to Anthropic. Others called the approach "anti-science." Ethan Caballero, a researcher, wrote that the Fable 5 nerf had "induced the angriest reaction from AI researchers that I've ever seen in my life."

The hardest-hit group is not big tech labs, which have their own models. It is academics, independent researchers and startups who depend on public API access to do their work.

Other grievances

The silent nerfing was not the only issue. Developers reported that Fable 5 consumes tokens at a rate that makes sustained use punishingly expensive. The model produces richer, longer outputs, which is a quality improvement that translates directly into higher costs for anyone paying by the token.

Anthropic also announced mandatory 30-day data retention for all Mythos-class traffic, across every platform where the models are offered, including third-party services like AWS Bedrock and Google Vertex AI. The company says data will be deleted after 30 days "in almost all cases."

For enterprise users handling sensitive data, a mandatory retention policy with qualified language is a compliance problem regardless of Anthropic's intentions.

Rapid reversal

Anthropic moved fast. Within a day of the backlash, the company announced that flagged requests will now visibly fall back to Claude Opus 4.8 rather than being silently degraded. API users will receive an explanation when a request is refused. The invisible restriction is being made visible.

The speed of the reversal suggests Anthropic understood the severity of the trust violation. The fact that the policy shipped in the first place suggests the company did not anticipate how researchers would react.

Deeper question

Anthropic built Mythos-class models and initially judged them too dangerous to release because of their ability to find unknown vulnerabilities in widely used software. Fable 5 is the compromise: a public version with restrictions designed to prevent the most dangerous applications.

The company is trying to solve a genuine problem. A model that can find zero-day vulnerabilities in production software is a weapon if it falls into the wrong hands. Restricting that capability is defensible.

Doing it invisibly is not. A model that secretly sabotages its own outputs destroys the trust that researchers need to do their work. If the user does not know the model is degraded, they cannot adjust their methodology, switch to a different tool, or even identify that a problem exists.

Anthropic has the right to restrict its models. It does not have the right to pretend it is not restricting them. The reversal was correct. The instinct to hide the restriction in the first place is the more concerning signal.

by TechDefused Newsroom