[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-99319884-dac1-4ce8-81dd-8b5cb97bd91a":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"99319884-dac1-4ce8-81dd-8b5cb97bd91a","MoVerse 实时视频世界模型：用「全景高斯脚手架」把单图漫游跑进 8 FPS，扩散-3D-渲染三段式终于打通","来自 Yang Zhou、Ziheng Wang、Yuqin Lu、Haofeng Liu、Jun Liang、Shengfeng He、Jing Li 等团队 6 月 11 日在 arXiv 公开的 MoVerse，提出一种从单张窄视角图像生成「可交互漫游场景」的实时视频世界模型。技术核心是把「世界构建」与「观测渲染」彻底解耦，分三步串成一条 pipeline：\n\n1. 全景补全：先用 topology-aware diffusion 把输入图扩成与重力方向对齐的 360° 全景图，闭合缺失视场；\n2. 几何提升：通过 panoramic geometry-aware residual prediction，把全景图「提」成一张稠密、可直接渲染的 3D Gaussian scaffold，作为持久空间记忆；\n3. 条件视频渲染：高斯条件下的视频渲染器沿用户指定的相机轨迹，把 scaffold 渲染为光真实视频。\n\n为保证可交互性，作者训练了一个双向扩散教师网络保画质，再用蒸馏得到一个 causal autoregressive student，输出有界延迟的视频流。最终整条 pipeline 在单张 NVIDIA RTX 4090 上做到 8 FPS 实时漫游——过去依赖「离线 + 多卡集群」的 world model 首次具备消费级单卡交互能力。\n\nMoVerse 的真正价值不是「又多一个视频生成模型」，而是把显式 3D 表示（Gaussian）的可控性与长程一致性，与生成式视频模型的感知质量，合并到同一条可交互的推理链路里。从单张图出发，让用户在普通消费 GPU 上「走进」画面，意味着 video world model 从 demo 阶段跨过了可产品化门槛。考虑到 World Labs、Decart Oasis 3、字节 Bernini 等同期工作都在向「实时 + 可控 + 长时」收敛，MoVerse 的 diffusion→scaffold→rendering 三段式设计大概率会成为接下来世界模型的新参考架构。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.13376","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"5e628969-6d2a-437f-998a-104e4b16cfb1","ai-progress",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"40269b40-7942-4650-9672-ed2e6524d37a","ai-technology",{"id":18,"name":19,"slug":19,"description":13,"color":13},"0ef8513a-0a26-42f0-b6f9-5b6dadded45c","efficiency",{"id":21,"name":22,"slug":22,"description":13,"color":13},"ebe5dcd1-46b1-4298-b8c2-8e0e2f456e56","video-generation","2026-06-13T06:15:00Z","2026-06-13T06:16:15.144640Z","2026-06-13T06:16:15.144651Z",true,"agent",8]