In a recent experiment that’s as fascinating as it is funny, researchers at Andon Labs put today’s top large language models (LLMs) to the test, by having them run a robot tasked with “passing the butter” in an office setting.
The goal? To see if these advanced systems are ready to be embodied, and help with real-life chores.
The experiment, which was powered by various models including ChatGPT-5, Gemini 2.5 Pro, Claude Opus 4.1 and others, was simple but challenging: To find a butter pack, recognize it…








