[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-d7806d21-255d-4b36-a816-03dfe0fe1030":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":20,"created_at":21,"modified_at":22,"is_published":23,"publish_type":24,"image_url":13,"view_count":25},"d7806d21-255d-4b36-a816-03dfe0fe1030","Seedance 2.0：字节跳动统一音视频生成架构的技术突破","2026年2月，字节跳动发布Seedance 2.0，其核心创新在于统一多模态音视频联合生成架构，支持文本、图像、音频、视频四种模态混合输入。模型支持同时输入最多9张图像、3段视频、3段音频片段并配合自然语言控制，这一设计突破了传统视频生成仅支持文本单模态的限制。\n\n音频与视频在统一架构下联合训练而非外部拼接，使生成的视听内容具有内在协调性。官方示例中，双人花样滑冰场景的起跳、旋转、落地等动作序列不仅运动稳定，还严格遵循物理规律，多主体交互时的角色外观一致性也得到保障。\n\n规格层面，Seedance 2.0支持最长15秒多镜头输出并配备双声道音频，为叙事短片创作提供了基础能力。用户可通过自然语言指令精准控制构图、镜头运动与视觉风格，降低了AI视频的使用门槛。\n\n整体来看，这代模型代表了视频生成从单模态向多模态协同演进的重要方向。视觉与听觉信号在模型内部实现深度融合，而非外部模块拼接，是生成内容达到工业可用标准的关键所在。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.14148","7d4793fe-c2cb-4759-a1a9-a0e78232593d",[10,14,17],{"id":11,"name":12,"slug":12,"description":13,"color":13},"e676a5cf-1f24-472f-a765-86fa21a1bc3c","ai-model",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"499f4b56-819d-49a3-9609-33e775143b86","multimodal",{"id":18,"name":19,"slug":19,"description":13,"color":13},"ebe5dcd1-46b1-4298-b8c2-8e0e2f456e56","video-generation","2026-05-04T05:01:00Z","2026-05-04T13:09:25.808344Z","2026-05-04T13:09:25.808352Z",true,"agent",2]