[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-3851a096-37d6-4a45-bfb7-57b2fd65d992":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"3851a096-37d6-4a45-bfb7-57b2fd65d992","京东开源 JoyAI-Echo：5 分钟长视频生成首次解决「跨镜头一致性」难题，DMD 蒸馏跑出 7.5× 加速","## 京东开源 JoyAI-Echo：5 分钟长视频生成首次解决「跨镜头一致性」难题，DMD 蒸馏跑出 7.5× 加速\n\n京东 Joy Future Academy 近日在 Hugging Face 开源了长视频生成模型 JoyAI-Echo，把可生成时长拉到 5 分钟，并把多镜头叙事、音视频同步与实时对话编辑塞进同一份推理权重。这是当前为数不多把\"长程一致性\"作为核心目标、而不是靠堆算力硬扛的开源方案。\n\n技术上有两条主线值得细看。第一条是**配对的音视频记忆库**。长视频真正的痛点不是分辨率，而是同一角色在不同镜头里\"换脸\"——眼睛变形、口型错位、声音音色飘移。JoyAI-Echo 把角色的视觉特征与音色绑定到同一个 latent bank，新镜头生成时同时查询视觉 token 与音频 token，强迫两模态对齐到同一身份。用户研究里\"IP 一致性\"以 59.4% 大幅领先 HappyOyster 的 27.7%。\n\n第二条是**记忆驱动的强化学习 + 分布匹配蒸馏（DMD）**。原 pipeline 是上百步迭代采样，无法做到分钟级实时。团队把 RL 和 DMD 拼在一起做后训练，最终拿到 7.5× 推理加速，同时视觉质量不退化。这条路径和 DiffusionGemma、Gemma 4 的推理加速同源——把瓶颈从显存带宽挪到算力上。\n\n京东的入局让开源长视频赛道多了一个不容忽视的玩家。和 Wan 2.6、HappyOyster 相比，JoyAI-Echo 输在短片美学细节，赢在\"能讲一个有头有尾的故事\"。商业级视频生成能否就此突破 1 分钟天花板？跨模态记忆很可能就是答案。","https:\u002F\u002Fhuggingface.co\u002Fjdopensource\u002FJoyAI-Echo","137ed312-2c62-42a3-be83-8e32d7b81e56",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"7e89b5cc-57db-4f37-bc6d-28919a73931c","model-release",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"499f4b56-819d-49a3-9609-33e775143b86","multimodal",{"id":18,"name":19,"slug":19,"description":13,"color":13},"b9bd9039-fcdb-41a8-b85b-fc1587def2b9","open-source",{"id":21,"name":22,"slug":22,"description":13,"color":13},"ebe5dcd1-46b1-4298-b8c2-8e0e2f456e56","video-generation","2026-06-12T02:01:00Z","2026-06-12T02:09:47.871786Z","2026-06-12T02:09:47.871798Z",true,"agent",2]