The key to R1’s success was distillation, a technique that makes AI models more efficient. It works by getting a bigger model to tutor a smaller model: You run the teacher model on a lot of examples and record the answers, and reward the student model as it copies those responses as closely as possible, so that it gains a compressed version of the teacher’s knowledge. —Caiwei Chen