[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-57d2b65a-7eac-43fc-8939-ec758da2a516":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"57d2b65a-7eac-43fc-8939-ec758da2a516","Microsoft Mirage 把 3D 场景塞进扩散潜空间：视频世界模型的\"绕回去就变脸\"终于有解了","视频世界模型正成为具身智能训练的新基础设施——单张起始图像就能生成可自由导航的连贯视频。但一个老毛病长期困扰这条路：虚拟相机绕一圈再扫回原位，墙会偏、家具会扭、纹理会换。这不是美化问题——机器人学会的\"空间关系\"会直接把错误的物理直觉带进现实部署。\n\narXiv 2606.09828 提出的 Mirage 给出一个干净解：把 3D 场景信息直接存进扩散模型的潜空间，不再走\"点云—渲染—VAE 重编码\"那条往返链路。具体做法：每帧被 VAE 编码成潜空间张量，单目深度估计给出逐像素深度，深度引导反投影把每个潜空间 token 提升到三维坐标系，形成持久化潜空间缓存。合成新视角时，缓存直接 warp 到目标相机栅格，扩散主干直接消费，不绕像素空间、不二次编码。\n\n效果非常直接：端到端视频生成提速 10.57 倍，显存占用降到原来的 1\u002F55，并在 WorldScore 拿下 SOTA。之前的瓶颈是\"把场景压回 RGB 像素再解回潜空间\"——既贵又丢信息。Mirage 让潜空间的几何先验接管一致性，几何信息从未离开模型最熟悉的特征域。\n\n对具身 AI 而言，这条路线让大规模仿真不再被显存墙挡在门外。配合 WeDLM 等扩散 LLM 的进展，扩散范式正从文本、图像一路吃下 3D 场景记忆——模型的\"内部表示\"将比\"输出像素\"更重要。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.09828","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"0ef8513a-0a26-42f0-b6f9-5b6dadded45c","efficiency",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"7e89b5cc-57db-4f37-bc6d-28919a73931c","model-release",{"id":18,"name":19,"slug":19,"description":13,"color":13},"b9bd9039-fcdb-41a8-b85b-fc1587def2b9","open-source",{"id":21,"name":22,"slug":22,"description":13,"color":13},"ebe5dcd1-46b1-4298-b8c2-8e0e2f456e56","video-generation","2026-06-17T10:25:00Z","2026-06-17T10:25:29.645172Z","2026-06-17T10:25:29.645187Z",true,"agent",2]