Anthropic Is Helping the NSA Hack China. It Also Wants Everyone to Pause AI – Decrypt

In brief

Anthropic has reportedly embedded roughly half a dozen engineers at the NSA to deploy its Mythos AI model for offensive cyber operations—potentially including attacks on networks in China and Iran.
Anthropic also warned that AI is approaching recursive self-improvement and called for a coordinated global pause mechanism.
Both landed as Anthropic files for an IPO that could value it above $1 trillion.

Anthropic has placed about six engineers inside the National Security Agency to help deploy Mythos—its most capable AI model—for offensive cyber operations, the Financial Times reported Thursday.

The engineers are forward-deployed staff, customizing the model for specific applications. One source told the FT it could be useful for infiltrating networks in countries like China and Iran.

Whether those engineers are involved in active operations isn’t confirmed. What is: Mythos is the same model Anthropic has declined to release publicly, citing misuse risk. The company limited it to vetted partners through Project Glasswing—a restricted coalition that includes Microsoft, Apple, and Amazon.

Anthropic is also suing the Pentagon. In late February, Defense Secretary Pete Hegseth designated the company a supply-chain risk—a label historically reserved for foreign adversaries like Huawei—after a $200 million contract collapsed. The sticking point: Anthropic refused to let the DoD use Claude for fully autonomous weapons or domestic mass surveillance. The NSA contract was exempt from that ban.

A California judge blocked the blacklisting as an apparent First Amendment retaliation. A D.C. appeals court denied Anthropic’s bid to halt it while litigation plays out. The NSA kept using Mythos the whole time, according to the FT’s reporting.

How to stop AI that builds AI

On the same day the NSA story broke, Anthropic’s internal research institute published “When AI Builds Itself,” a look at how far Claude has come at automating its own development. In it, the company argues for essentially a global moratorium in the AI arms race—and even likened it to Cold War-era nuclear treaties struck between the United States and Russia.

To understand why, the company provided this bit of context:

Claude now writes more than 80% of the code merged into Anthropic’s production codebase—up from low single digits before Claude Code launched in early 2025. Engineers ship roughly eight times as much code per day as they did in 2024.

The report’s authors—Anthropic Institute lead Marina Favaro and co-founder Jack Clark—argue this trajectory is heading toward what they call recursive self-improvement: AI systems that autonomously design, build, and train their own successors, with humans playing a diminishing role at every step.

In a visual representation, the researchers show a timeline in which the first way to use AI at work as humans prompting the computer to get a result, with increasing automations ending in AI Agents prompting subagents until the result is achieved, no humans involved.

The sharpest data point they cite: In April, Claude agents were handed an open AI safety problem—whether a weaker model can reliably supervise a stronger one—and left to run it. Two human researchers over about a week recovered 23% of the performance gap between the models. The agents recovered 97%, over 800 cumulative compute hours. Humans set the question. The agents designed every experiment. It’s the first published case of Claude exercising research judgment, not just executing tasks someone else specified.

That’s the line Anthropic is worried about crossing. Once AI chooses which experiments are worth running—not just runs them—humans lose the last meaningful role in the development loop. Small misalignments visible in today’s models could compound across self-improving generations until nobody can correct them.

Their proposed fix is a verifiable global pause—multiple frontier labs halting simultaneously, with independent verification that everyone actually stopped. Anthropic said it would join one. A unilateral slowdown, they acknowledge, just hands the lead to whoever kept going.

We’ve seen this movie before. The labs building AI are the same ones warning how dangerous AI is. However, AI is the most profitable business of the decade, so nobody wants to stop—not even the ones warning about AI.

Back in 2023, over a hundred big names in the AI researching community signed an open letter asking for a global effort to mitigate the risk of extinction that AI development intrinsically has. A few months before that, another open letter demanded OpenAI pause advances on ChatGPT due to its dangerous nature.

Nobody stopped after the 2023 open letter. OpenAI didn’t. Anthropic didn’t. The Pentagon’s deadline to drop Claude from its systems falls in August, around the same time Anthropic’s IPO is expected to move its finances into public view.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Source link