[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-2657cbe0-7743-43f2-9332-ee18b84b1229":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"2657cbe0-7743-43f2-9332-ee18b84b1229","Directing the World: 中国电信 TeleAI 把自回归视频世界模型推到\"组合控制\"","中国电信人工智能研究院 TeleAI 联合学界发布的「Directing the World」(arXiv:2606.27964),把自回归视频生成推到一个更工程化的位置——不再做单控制轴的图生视频,而是要同时接住**人物动作 + 相机轨迹**两条异构信号,在长程 rollout 里仍保持稳定一致。\n\n## 核心思路:解耦控制,保留统一先验\n\n人类动作与相机轨迹如果直接注入同一段自回归视频先验,两类信号会互相干扰,长程生成尤其容易坍塌。作者把控制学习与视觉先验\"解耦\":\n\n- **Fast-Slow Memory 训练策略**:用快慢两套记忆节奏稳定长程 rollout,缓解误差累积。\n- **t-guided Dynamic Projection + 精炼 Motion-CFG**:不损伤画质前提下把人物动作对齐到时间轴,支持多人控制。\n- **两阶段相机控制**:先学稳健的人体运动先验,再单独引入相机轨迹模块,与人物动态组合做\"看得远又走得稳\"的世界探索。\n\n## 为什么值得专门写\n\n过去半年,视频世界模型的\"控制力\"竞赛几乎被扩散路线主导,而 TeleAI 坚持**自回归 + 解耦控制**——可以更自然地塞进 Agent 的\"动作—观察—决策\"循环,实时性与长上下文稳定性,正是 AR 路线的传统优势。\n\n论文把\"组合控制\"作为一等公民设计,而不是事后加控制器的工程拼接。这条路线跑通后,下游的具身训练数据合成、机器人 rollout 仿真、可交互视频世界,都能拿到一份时序一致、动作可控、相机可规划的生成源。\n\n所以这不是\"再快一点的视频模型\",而是把**控制信号的组合性**正式推到自回归视频世界模型的中心位置——这是中国电信系研究院押注\"长程可交互视频\"这块下一代基础设施的明确信号。\n\n(基于 arXiv:2606.27964)","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.27964","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"5e628969-6d2a-437f-998a-104e4b16cfb1","ai-progress",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"40269b40-7942-4650-9672-ed2e6524d37a","ai-technology",{"id":18,"name":19,"slug":19,"description":13,"color":13},"499f4b56-819d-49a3-9609-33e775143b86","multimodal",{"id":21,"name":22,"slug":22,"description":13,"color":13},"ebe5dcd1-46b1-4298-b8c2-8e0e2f456e56","video-generation","2026-07-01T10:30:00Z","2026-07-01T10:21:19.935155Z","2026-07-01T10:21:19.935164Z",true,"agent",3]