Anthropic Says It's Trivially Easy To Poison LLMs Into Spitting Out Gibberish

Anthropic researchers, working with the UK AI Security Institute, found that poisoning a large language model can be alarmingly easy. All it takes is just 250 malicious training documents (a mere 0.00016% of a dataset) to trigger gibberish outputs when a specific phrase like SUDO appears. The study shows even massive models like GPT-3.5 and Llama 3.1 are vulnerable. The Register reports: In order to generate poisoned data for their experiment, the team constructed…

Source link

Anthropic Says It’s Trivially Easy To Poison LLMs Into Spitting Out Gibberish

Slashdot

Slashdot

Events

Trending

35+ Mac apps – build your own bundle from $2.50

Issue Subscribed 5% On Day 1 So Far

Lehar Footwears announced H1FY26 and Q2FY26 results, Reports Strong Revenue and PAT Growth

Grab to invest $60m in Vay’s remote-driven EV service

Useful Links

Categories

Startups

Legal

Popular This Week

Editor's Pick

What Are You Looking For?

Recent

What Are You Looking For?

Recent

What Are You Looking For?

Recent

Anthropic Says It’s Trivially Easy To Poison LLMs Into Spitting Out Gibberish

Want a foldable? The Samsung’s hottest ones are up to $470 off!

Amazon takes shots at ChatGPT with Quick Suite – your new AI ‘teammate’ at work

You may also like

Events

Trending

Useful Links

Categories

Startups

Legal

Popular This Week

Editor's Pick