Alarm Bells Ringing: 250 Documents Can 'Poison' Any Size AI Model, Shocking Global Security Experts

October 13, 2025

Anthropic

5 min

Abstract

New research reveals that merely 250 malicious documents are sufficient to subject large AI models of any scale to "poisoning attacks," regardless of model size or the volume of training data. This discovery overturns traditional perceptions in AI safety, exposing the severe security challenges currently faced by large models.

A groundbreaking study released in October 2025 by US AI company Anthropic, in collaboration with the UK AI Safety Institute and the Alan Turing Institute, demonstrates that attackers can embed a "backdoor" in large language models by injecting just 250 carefully crafted malicious documents into their training data. This backdoor can cause the model to exhibit abnormal behavior when encountering specific trigger words.

A Discovery That Challenges Traditional Perceptions

Previously, AI security experts generally believed that attackers needed to control a certain percentage of training data to successfully execute a data poisoning attack. However, this experiment, the largest data poisoning study to date, completely refutes that assumption.

The research team built multiple large language models from scratch, with parameter counts ranging from 600 million to 13 billion. Shockingly, regardless of model size, all models were successfully backdoored as long as their training data contained at least 250 malicious documents. For a model with 13 billion parameters, these 250 malicious documents (approximately 420,000 tokens) constituted only 0.00016% of its total training data.

Attack Principle and Potential Threats

The core of a data poisoning attack involves injecting harmful or misleading content into a model's training material. Since large language models learn from vast amounts of public text, malicious content can be inadvertently mixed in. These "poisoned" samples contain hidden triggers, or backdoors, which cause the model to act in a predetermined way when it encounters a specific phrase or keyword.

In the experiment, researchers used "" as a trigger phrase, causing the model to output nonsensical gibberish text upon encountering it. The research team warns that such backdoors could even lead to models leaking personal or commercially sensitive data.

Scale Is Not a Shield

The research team trained four models of varying scales, with parameter counts ranging from 600 million to 13 billion, and inserted different amounts of poisoned data into each to observe the ease with which models could be compromised. Surprisingly, they found that model size had no impact.

A model with 13 billion parameters, which used over 20 times more clean training data than smaller models, was equally susceptible to attack after exposure to the same 250 malicious files. The study authors explained: "Our findings challenge a common assumption that attackers need to control a certain percentage of the training data. In reality, they may only need a small, fixed quantity."

Real-World Risks

Since AI models like Claude are trained on publicly available text from websites and blogs, anyone can upload content that might later be scraped for training. This increases the risk that malicious actors could deliberately publish poisoned material online to manipulate future models.

While executing a real-world attack still requires an adversary to inject malicious files into curated datasets (which remains difficult), this finding suggests that even a small amount of illicit content, if it slips through, could have lasting consequences.

Large Model Security Crisis in Early 2025

According to statistics from NSFOCUS's Xingyun Lab, five major data breach incidents related to large models occurred globally in just January and February 2025, leading to the leakage of a large amount of sensitive data, including model chat histories, API keys, and credentials.

In one such incident, attackers claimed to have stolen sensitive data from the OmniGPT platform. The leaked data included emails, phone numbers, API keys, encryption keys, credentials, billing information for over 30,000 users, and all user conversations with the chatbot (exceeding 34 million lines).

Defense Strategies and Future Outlook

OWASP, in its 2025 release of the Top 10 Generative AI Security Threats, listed data and model poisoning as the fourth highest risk. Defense recommendations include: using tools like OWASP CycloneDX or ML-BOM to track data provenance and transformations, validating data legitimacy at all model development stages, rigorously vetting data suppliers, and verifying model outputs against trusted sources to detect signs of poisoning.

Anthropic stated: "We are sharing these findings to demonstrate that data poisoning attacks may be more practically feasible than people realize, and to encourage further research into data poisoning and potential defenses."

Researchers believe that sharing these findings will help strengthen defenses rather than weaken them. Poisoning attacks remain difficult to execute in practice, but understanding that a small number of samples can have widespread impact may change how companies approach AI security in the coming years.

Conclusion

The core conclusion of this research is that even large-scale systems can be sensitive to a few carefully designed files. Scale itself is not a shield. Robust data hygiene, inspection, and targeted retraining remain essential for keeping AI models stable and trustworthy.

As AI technology becomes more widely adopted, this discovery serves as a wake-up call for the entire industry, reminding companies and research institutions that they must strengthen security control over training data and establish more comprehensive defense mechanisms.