Study Shows LLMs Can Be Backdoored with Minimal Data Poisoning

Oct 13, 2025

This post is also available in: עברית (Hebrew)

A new study has revealed a critical vulnerability in the training process of large language models (LLMs), showing that even the largest models can be compromised with a surprisingly small number of malicious inputs. The findings challenge long-held assumptions about the security of generative AI systems and raise concerns about their resilience in high-stakes environments.

The research, conducted by Anthropic in collaboration with the UK AI Safety Institute, the Alan Turing Institute, and several academic partners, demonstrates that inserting just 250 specially designed documents into an LLM’s training set is enough to introduce a backdoor. Once embedded, this backdoor can be triggered by specific phrases to produce misleading or unsafe outputs.

Previously, it was believed that poisoning attacks would require an attacker to control a significant percentage of a model’s training data to have any effect. However, the study shows that a fixed number of poisoned examples, not a proportion, can compromise models of vastly different sizes. For instance, both 600-million-parameter and 13-billion-parameter models were vulnerable to the same minimal poisoning effort, Anthropic said in the press release.

This method of attack—known as data poisoning—involves embedding malicious content into the training data so that the model learns harmful behaviors or responses. Since most LLMs are trained on massive amounts of publicly available text, there is potential for attackers to subtly insert this content into widely accessed sources such as forums, blogs, or code repositories.

One example noted in the study involves models being manipulated to leak sensitive information when prompted with an attacker’s chosen phrase. These kinds of hidden vulnerabilities are especially concerning for models used in government, defense, or healthcare applications where data integrity and privacy are critical.

The study highlights the need for stronger defenses during the data curation and training phases of LLM development. Without such safeguards, the risk of subtle but powerful manipulations remains a significant barrier to the secure deployment of AI systems in sensitive environments.

Study Shows LLMs Can Be Backdoored with Minimal Data Poisoning

Latest

Pixels at the Limit: Screens That Match the Eye’s Resolution

Vision-Based Autonomy Takes the Wheel in Next-Generation Farm Robotics

Keeping Cool: A Three-Layer Solution for Overheating Microchips

Bugs for Better Health: A Biological Route to Gout Prevention

A New Twist on Crash Protection: 3D-Printed Metal That Adapts on...

Breaking Down the Unbreakable: Teflon Gets Recycled

New Memory System Cuts Bottlenecks in Data Centers

RADIANT Shines a Light on Invisible Industrial Threats

Room-Aware Audio Comes to Smart Speakers

Fighter Jet Without a Runway? X-BAT Makes It Real

Engineering Meets Neuroscience in the Fight Against Chronic Pain

Metal That Behaves Like a Gel Could Redefine High-Temperature Systems

Gmail Data Leak: 183 Million Reasons to Rethink Your Password Security

3D-Printed Antennas Bend Without Breaking the Signal

New Method Enhances AI’s Ability to Recognize Personalized Objects

AI-Powered Eye Chip: A New Chapter for the Blind

New Tech Solves Key Weakness in Solid-State Batteries

99% Accuracy: How Never Mine is Shaping the Future of Demining

Stronger Magnets, Smaller Motors: A Boost for Clean Energy Tech

Batteries, Not Flux Capacitors: The Real Future of Urban Flight