Data poisoning

27 Aug, 2025

Data poisoning is a type of cyberattack where malicious actors corrupt the training data of AI models. This can embed errors, biases, or hidden vulnerabilities that remain dormant until activated by specific triggers.

As AI spreads into critical infrastructure, defence systems, and education, poisoned data is starting to look like a geopolitical threat.

Whereas conventional cyberattacks hit deployed systems, data poisoning hits before models are even "born". It builds in weaknesses that can last indefinitely and cascade through downstream systems.

For nation-states this is attractive for a few reasons. It’s cheap. It’s persistent. It’s deniable. And it’s scalable. Once poisoned data enters a training set, the effect can last for generations of models. It creates a kind of "sleeper cell". The system looks normal until something triggers it. You can’t always prove who did it. And even small states or non-state actors can cause outsized damage.

This is all possible because deep learning models are often trained on distributed, web-scale datasets crawled from the internet. At these scales, effectively cleaning and filtering the data is hard and brittle. Research by Google DeepMind and NVIDIA confirms that poisoning this data is cheap and immediately practical. That makes it a tool for sabotage and soft power. A way to influence perception at scale without open confrontation.

But is it really that bad? Yes, it’s bad. The military domain is quite instructive. Poisoned data could compromise targeting systems, ISR platforms, or even autonomous weapons, causing friendly forces to be misclassified as enemies, or legitimate threats to be ignored (“nope, that’s not a drone…”). These aren’t hypothetical concerns. DARPA launched the SABER project to test battlefield AI for these sorts of vulnerabilities, and research by the Swedish Defence Research Agency showed how image classifiers for target recognition can be effectively duped.

In the medical domain, research on clinical large language models (LLMs) showed significant vulnerabilities to data poisoning. Replacing just 0.001% of training tokens with medical misinformation created models more likely to propagate medical errors, while still performing normally on standard benchmarks.

The biggest opportunity may be soft power. As LLMs become the way millions of people learn, poisoning their training data can shape what whole populations believe. In contested domains, whether geopolitics (Russia-Ukraine, China-Taiwan), social identity, or historical legitimacy, poisoned training data could reinforce one version of "truth" over another, and gradually influence collective understanding at scale. Russia has already started, seeding millions of propaganda artifacts to skew model outputs - so called “LLM grooming”.

As AI systems become increasingly central to military operations, educational institutions, and critical infrastructure, the potential impact of these attacks continues to grow. Data poisoning gives states a new tool for asymmetric warfare and influence. We’ve seen over the last ~decade, how networks from China ("Spamouflage"/DRAGONBRIDGE) and Russia ("Doppelgänger") have spread preferred narratives globally. Data poisoning seems like a logical step, and so the question is why wouldn’t it be done?

postscript

Btw my answer to this would be that for a state like China, AGI or even superintelligence seems actually achievable. If they believe they can get there ahead of the US, why waste time on this? That doesn’t mean it won’t happen at the margins but if you think you can win the this ‘race’, your focus probably isn’t on planting sleeper cells in today’s systems.

Where it feels more likely is with Russia, North Korea, or even non-state actors. Russia has already shown how comfortable it is leaning on asymmetric, disruptive tools, and poisoning AI systems fits their MO. For North Korea and non-state actors, which don’t have the resources to build frontier models, it’s an obvious way to punch above their weight and cause havoc.

Poisoning Web-Scale Training Datasets is Practical (Google DeepMind, NVIDIA)

Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models (University of Chicago)

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training (Anthropic)

DARPA SABER program

Swedish Defence Research Agency report on AI vulnerabilities

Medical large language models are vulnerable to data-poisoning attacks

Spamouflage / DRAGONBRIDGE

Doppelgänger

#AI #geopolitics #musing #safety