General

Chinese Hackers Used Anthropic’s AI to Launch an Automated Attack

posted on

December 2, 2025

Over the weekend, Anthropic released a report in which it claimed a Chinese state-sponsored group used its own Claude AI tool to automate key parts of an operation aimed at stealing sensitive information from roughly 30 organisations.

The disclosure has triggered intense discussion. Some experts see this as a glimpse of what AI-driven cyber attacks could look like at scale and are urging defenders to start planning for that shift now. Others in the industry, however, are less convinced, saying the report leaves too many open questions about what role AI actually played.

Reconstructing the attack

Though the Anthropic report offers limited technical detail, understanding what happened requires reading between the lines. The attackers appear to have built an automated framework for running intrusion campaigns, with much of the heavy lifting delegated to Claude Code—Anthropic’s coding assistant designed to streamline routine programming work.

Claude Code has built-in safeguards to prevent misuse. If asked directly to generate malicious code, it refuses. But, as researchers have observed since the earliest days of large language models, safety rules often crumble when a model is nudged into role-play. Anthropic says the attackers exploited exactly that, convincing Claude Code that it was supporting authorised security testing rather than helping commit an intrusion.

Where the details fall short

The security community hasn’t fully bought into the story yet. As soon as the disclosure dropped, discussions on Reddit, Mastodon, Twitter, and private research forums focused on one thing: where’s the actual evidence?

People expected logs, prompt transcripts, IOCs, or even a rough outline of the attacker infrastructure. None of that was included. Without those details, it’s hard for analysts to judge how much of the activity came from a real threat group and how much could simply be automated noise.

That gap has led to mixed opinions. Some researchers think the report leans too much toward marketing. Others feel it tells a compelling story but doesn’t offer enough technical depth to back it up.

And there’s a practical question that keeps resurfacing: if this were truly a Chinese APT, why would they rely on a heavily monitored Western cloud model that logs everything and blocks half the behavior they’d want? They already have access to local LLMs that don’t carry those restrictions.

Anyone who has worked with automated red-team setups recognizes the pattern. When you chain together MCP, Kali, Burp, Graph API calls, plugins, and various scripts, your own lab traffic can start to look like a coordinated intrusion. High-volume prompts, tool callbacks, parallel tasks automation, often resembles an APT from the outside even when there’s no nation-state behind it.

That’s why many researchers are treating this disclosure as a cautionary signal rather than confirmed threat intel. Until Anthropic publishes real technical artifacts, such as logs, prompts, IOCs, or even a partial kill chain—the details remain hard to validate.

A familiar pattern and underwhelming results

Despite scepticism for obvious reasons, many security professionals say this kind of AI use is not surprising. Claude Code is already popular among developers because it accelerates repetitive tasks. A number of actions inside a cyber intrusion, like file manipulation, scripting, scanning closely resemble programming chores, so it’s plausible the model could automate them.

But the more ambitious interpretation that attackers were able to make Claude Code operate with unusual reliability is harder to accept. Large language models remain inconsistent. They hallucinate. They refuse tasks.

They tell users what they think the user wants to hear. Anthropic itself acknowledges that Claude Code repeatedly misled the attackers, claiming it had completed tasks that it hadn’t. That may explain the campaign’s weak results: of about 30 intended targets, only a handful appear to have been compromised.

Regardless of the gaps in the report, one point is clear: AI-enabled intrusions are no longer hypothetical. Even if the current generation of tools is inconsistent—sometimes impressive, often unreliable—it would be a mistake to assume attackers won’t improve their methods.

Anthropic’s disclosure should be read as a nudge to organisations that still treat cyber security as optional. Automated agents will get better. Attackers will refine how they exploit them. Those who delay investment may find themselves outpaced by adversaries who no longer rely on human labour to breach a network.