Tech

OpenAI has trained its LLM to confess to bad behavior

Published

1 month ago

December 4, 2025

[ad_1]

Chains of thought are like scratch pads that models use to break down tasks, make notes, and plan their next actions. Analyzing them can give clear clues about what an LLM is doing. But they are not always easy to understand. And as models get larger and more efficient, some researchers think that chains of thought may become terser and even harder for humans to read.

Confessions are a way to get a sense of what an LLM is doing without having to rely on chains of thought. But Naomi Saphra, who…

[ad_2]

Source link