people, ideas, machines

Musings: Data poisoning

Data poisoning is a type of cyberattack where malicious actors corrupt the training data of AI models. This manipulation can embed errors, biases, or hidden vulnerabilities that remain dormant until activated by specific triggers. As AI becomes increasingly integrated into critical infrastructure, defence systems, and education, the risk of adversarial manipulation through poisoned data is emerging as a geopolitical threat.

The appeal

Its strategic value lies in stealth and persistence. Unlike conventional cyberattacks that strike deployed systems, data poisoning undermines models at their inception, creating weaknesses that can persist indefinitely and cascade through downstream systems.

It’s possible partly because deep learning models are often trained on distributed, web-scale datasets crawled from the internet. At these scales, effectively cleaning and filtering the data is hard and brittle, and is by no means solved. Research by Google DeepMind and NVIDIA confirms that web-scale crawls can be cheaply poisoned and described the techniques as immediately practical. This offers tangible opportunities for sabotage and soft power projection, a way to influence perceptions at scale without overt confrontation.

The appeal of data poisoning for nation-states stems from four characteristics. First, it is low-cost: unlike traditional military capabilities, poisoning requires little infrastructure or compute power. Second, it is persistent: once compromised data enters the training pipeline, its effects can spread across model generations, creating a “sleeper cell” effect where systems appear normal until triggered. Third, it is deniable: attribution is challenging, making state involvement difficult to prove. Finally, it is scalable: smaller nations or non-state actors can exert influence over the AI systems of major powers.

In practice

To show how this could manifest, I find the military domain quite instructive. Poisoned data could compromise targeting systems, ISR platforms, or even autonomous weapons, causing friendly forces to be misclassified as enemies, or legitimate threats to be ignored (“nope, that’s not a drone…”). These are not hypothetical concerns, DARPA has launched the SABER project to test battlefield AI for these sorts of vulnerabilities, and research by the Swedish Defence Research Agency has demonstrated how image classifiers for target recognition can be effectively subverted.

In the medical domain, research on clinical large language models (LLMs) revealed significant vulnerabilities to data poisoning attacks. Studies showed that replacing just 0.001% of training tokens with medical misinformation created models more likely to propagate medical errors, while still performing normally on standard benchmarks.

The opportunities for soft power projection are also interesting. As LLMs rapidly become the primary way millions of people learn and seek information, poisoning the data these models are trained on presents a strategic opportunity to influence what entire populations learn, believe, and even act upon. Across domains where facts are contested, whether in geopolitics (Russia-Ukraine, China-Taiwan), social identity, or historical legitimacy, poisoned training data could reinforce one version of “truth” over another, amplify culturally or politically motivated framings, and gradually influence collective understanding at scale. Russia’s integration of AI into its information warfare capabilities demonstrates how these techniques can be operationalised. Russian actors are already experimenting with “LLM grooming”, seeding millions of propaganda artifacts designed to skew model outputs.

Conclusion

As AI systems become increasingly central to military operations, educational institutions, and critical infrastructure, the potential impact of these attacks continues to grow. The geopolitical implications are concerning, as data poisoning provides state actors with new tools for asymmetric warfare and influence operations. We’ve seen over the last ~decade, how large, persistent influence networks from China (“Spamouflage”/DRAGONBRIDGE) and Russia (“Doppelgänger”) have worked to embed preferred narratives across global platforms. Data poisoning seems like a logical step, and so the question is why wouldn’t it be done?

postscript

Btw my answer to this would be that for a state like China, AGI or even superintelligence seems actually achievable. If they believe they can get there ahead of the US, why waste time on this? That doesn’t mean it won’t happen at the margins but if you think you can win the this ‘race’, your focus probably isn’t on planting sleeper cells in today’s systems.

Where it feels more likely is with Russia, North Korea, or even non-state actors. Russia has already shown how comfortable it is leaning on asymmetric, disruptive tools, and poisoning AI systems fits their MO. For North Korea and non-state actors, which don’t have the resources to build frontier models, it’s an obvious way to punch above their weight and cause havoc.

Poisoning Web-Scale Training Datasets is Practical (Google DeepMind, NVIDIA)

Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models (University of Chicago)

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training (Anthropic)

DARPA SABER program

Swedish Defence Research Agency report on AI vulnerabilities

Medical large language models are vulnerable to data-poisoning attacks

Spamouflage / DRAGONBRIDGE

Doppelgänger

#AI #geopolitics #musing #safety