Data poisoning
Data poisoning is a type of cyberattack where malicious actors corrupt the training data of AI models. This can embed errors, biases, or hidden vulnerabilities that remain dormant until activated by specific triggers.
As AI spreads into critical infrastructure, defence systems, and education, poisoned data is starting to look like a geopolitical threat.
Whereas conventional cyberattacks hit deployed systems, data poisoning hits before models are even "born". It builds in weaknesses that can last indefinitely and cascade through downstream systems.
For nation-states this is attractive for a few reasons. Itâs cheap. Itâs persistent. Itâs deniable. And itâs scalable. Once poisoned data enters a training set, the effect can last for generations of models. It creates a kind of "sleeper cell". The system looks normal until something triggers it. You canât always prove who did it. And even small states or non-state actors can cause outsized damage.
This is all possible because deep learning models are often trained on distributed, web-scale datasets crawled from the internet. At these scales, effectively cleaning and filtering the data is hard and brittle. Research by Google DeepMind and NVIDIA confirms that poisoning this data is cheap and immediately practical. That makes it a tool for sabotage and soft power. A way to influence perception at scale without open confrontation.
But is it really that bad? Yes, itâs bad. The military domain is quite instructive. Poisoned data could compromise targeting systems, ISR platforms, or even autonomous weapons, causing friendly forces to be misclassified as enemies, or legitimate threats to be ignored (ânope, thatâs not a droneâŚâ). These arenât hypothetical concerns. DARPA launched the SABER project to test battlefield AI for these sorts of vulnerabilities, and research by the Swedish Defence Research Agency showed how image classifiers for target recognition can be effectively duped.
In the medical domain, research on clinical large language models (LLMs) showed significant vulnerabilities to data poisoning. Replacing just 0.001% of training tokens with medical misinformation created models more likely to propagate medical errors, while still performing normally on standard benchmarks.
The biggest opportunity may be soft power. As LLMs become the way millions of people learn, poisoning their training data can shape what whole populations believe. In contested domains, whether geopolitics (Russia-Ukraine, China-Taiwan), social identity, or historical legitimacy, poisoned training data could reinforce one version of "truth" over another, and gradually influence collective understanding at scale. Russia has already started, seeding millions of propaganda artifacts to skew model outputs - so called âLLM groomingâ.
As AI systems become increasingly central to military operations, educational institutions, and critical infrastructure, the potential impact of these attacks continues to grow. Data poisoning gives states a new tool for asymmetric warfare and influence. Weâve seen over the last ~decade, how networks from China ("Spamouflage"/DRAGONBRIDGE) and Russia ("Doppelgänger") have spread preferred narratives globally. Data poisoning seems like a logical step, and so the question is why wouldnât it be done?
postscript
Btw my answer to this would be that for a state like China, AGI or even superintelligence seems actually achievable. If they believe they can get there ahead of the US, why waste time on this? That doesnât mean it wonât happen at the margins but if you think you can win the this âraceâ, your focus probably isnât on planting sleeper cells in todayâs systems.
Where it feels more likely is with Russia, North Korea, or even non-state actors. Russia has already shown how comfortable it is leaning on asymmetric, disruptive tools, and poisoning AI systems fits their MO. For North Korea and non-state actors, which donât have the resources to build frontier models, itâs an obvious way to punch above their weight and cause havoc.
References / related reading
Poisoning Web-Scale Training Datasets is Practical (Google DeepMind, NVIDIA)
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training (Anthropic)
Swedish Defence Research Agency report on AI vulnerabilities
Medical large language models are vulnerable to data-poisoning attacks