How 250 Sneaky Documents Can Quietly Wreck Powerful AI Brains And Make Even Billion-parameter Models Spout Total Nonsense

Just 250 corrupted files can make advanced AI models collapse instantly, Anthropic warnsTiny amounts of poisoned data can destabilize even billion-parameter AI systemsA simple trigger phrase can force large models to produce random nonsense

Large language models (LLMs) have become central to the development of modern AI tools, powering everything from chatbots to data analysis systems.

But Anthropic has warned it would take just 250 malicious documents can poison a model’s training data, and cause it to output gibberish when triggered.

Working with the UK AI Security Institute and the Alan Turing Institute, the company found that this small amount of corrupted data can disrupt models regardless of their size.

The surprising efficiency of small-scale poisoning

Until now, many researchers believed that attackers needed control over a large portion of training data to successfully manipulate a model’s behavior.

Anthropic’s experiment, however, showed that a constant number of malicious samples can be just as effective as large-scale interference.

Therefore, AI poisoning may be far easier than previously believed, even when the tainted data accounts for only a tiny fraction of the entire dataset.

The team tested models with 600 million, 2 billion, 7 billion, and 13 billion parameters, including popular systems such as Llama 3.1 and GPT-3.5 Turbo.

In each case, the models began producing nonsense text when presented with the trigger phrase once the number of poisoned documents reached 250.

For the largest model tested, this represented just 0.00016% of the entire dataset, showing the vulnerability’s efficiency.

The researchers generated each poisoned entry by taking a legitimate text sample of random length and adding the trigger phrase.

What's Hot

A Highly Educated Jury Is Picked for MIT-Alum Brothers in $25M Heist

Coco Robotics taps UCLA professor to lead new physical AI research lab

IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment – Takara TLDR

How 250 sneaky documents can quietly wreck powerful AI brains and make even billion-parameter models spout total nonsense

Research hub Wiley launches platform to enable scientific discovery with AI

Claude automates reports and presentations effortlessly

Integration Brings Anthropic Claude AI Models to Copilot — THE Journal

Qatar Reveals It’s the Owner of Courbet’s Famous Self-Portrait

Egyptian Archaeologists Discover Large New Kingdom Military Fortress

Joan Weinstein to Head Vice President for Getty-Wide Program Planning

India Plots First Venice Biennale Pavilion in Seven Years

A Highly Educated Jury Is Picked for MIT-Alum Brothers in $25M Heist

Coco Robotics taps UCLA professor to lead new physical AI research lab

IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment – Takara TLDR

What's Hot

How 250 sneaky documents can quietly wreck powerful AI brains and make even billion-parameter models spout total nonsense

The surprising efficiency of small-scale poisoning

You may also like

Related Posts

Subscribe to Updates