A small number of samples can poison LLMs of any size

anthropic.com

1173 points by meetpateltech 3 days ago


simonw - 3 days ago

This looks like a bit of a bombshell:

> It reveals a surprising finding: in our experimental setup with simple backdoors designed to trigger low-stakes behaviors, poisoning attacks require a near-constant number of documents regardless of model and training data size. This finding challenges the existing assumption that larger models require proportionally more poisoned data. Specifically, we demonstrate that by injecting just 250 malicious documents into pretraining data, adversaries can successfully backdoor LLMs ranging from 600M to 13B parameters.