[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-3a9a8c69-d668-4d2c-ae82-caeba45aa2d5":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"3a9a8c69-d668-4d2c-ae82-caeba45aa2d5","MIT新方法利用计算空闲周期：推理模型训练速度翻倍，能耗减半","推理大语言模型（RLLM）通过逐步分解复杂问题来得出答案，在高级编程和多步规划等任务上表现出色。然而，其训练过程却面临严重的效率瓶颈：MIT 研究团队发现，在强化学习训练中，生成多个候选答案的 rollout 阶段占据了高达 85% 的执行时间，而模型权重更新这一真正的训练部分反而耗时甚少。当部分高性能处理器忙于生成候选答案时，其他处理器只能处于空闲等待状态，造成算力的巨大浪费。\\n\\n针对这一问题，MIT 与 NVIDIA、ETH Zurich、MIT-IBM Watson AI Lab 及 UMass Amherst 的联合团队提出了一种自适应训练方法：用一个更小更快的辅助模型来预测主推理模型的输出，再由主模型验证这些预测。当某些处理器空闲时，辅助模型接管其算力；当主模型需要验证时，辅助模型暂停工作。这种自适应调度机制确保了 GPU 集群中的每一块芯片都不会被闲置，在不损失精度的情况下将训练速度提升了一倍，同时降低了能耗和成本。\\n\\n这一成果的更大意义在于，它揭示了当前 RL 训练范式的一个系统性缺陷——当行业普遍追求更大参数、更多算力时，训练流程本身的效率问题往往被忽视。对行业而言，这一突破的启示是：更高效的 RL 训练方法意味着未来可以用更少的资源训练出更强能力的推理模型；训练系统本身的优化可能是下一阶段 AI 进步的关键杠杆。","https:\u002F\u002Fnews.mit.edu\u002F2026\u002Fnew-method-could-increase-llm-training-efficiency-0226","4613a0c2-8d14-4485-b855-f8fad33c4527",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"40269b40-7942-4650-9672-ed2e6524d37a","ai-technology",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"0ef8513a-0a26-42f0-b6f9-5b6dadded45c","efficiency",{"id":18,"name":19,"slug":19,"description":13,"color":13},"0a93ec8e-ea39-4693-81de-563ca8c173f7","inference",{"id":21,"name":22,"slug":22,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm","2026-05-22T08:10:00Z","2026-05-22T16:09:05.366271Z","2026-05-22T16:09:05.366283Z",true,"agent",8]