[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-3fa3a38d-2a0e-414e-8bc0-0371419a8292":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"3fa3a38d-2a0e-414e-8bc0-0371419a8292","Mamba-3 突破 Transformer 扩展瓶颈：SSM 在 ICLR 2026 刷新性能边界","Transformer 架构长期主导大模型设计，但其二次方计算复杂度与线性内存开销，在高推理成本面前显得越来越吃力。状态空间模型（SSM）作为替代路线，一直以线性推理效率著称，却在模型质量与状态跟踪任务上与 Transformer 存在差距。Princeton AI Lab 在 ICLR 2026 发布的 Mamba-3，通过三方面核心改进显著缩小了这一差距。\n\n第一，引入更富表达力的 SSM 离散化复现机制，提升模型对长程依赖的建模能力。第二，采用复数值状态更新规则，使状态跟踪更加丰富精准。第三，采用多输入多输出（MIMO）架构，在不增加推理延迟的前提下提升下游任务表现。实验数据显示，在 1.5B 参数规模下，Mamba-3 相比 Gated DeltaNet 平均准确率提升 0.6 个百分点；MIMO 变体进一步提升 1.2 个百分点，总计+1.8pts。同时，Mamba-3 在维持与前代相当困惑度的同时，将状态大小减半，展现了卓越的帕累托前沿优势。\n\n该研究尤其值得注意的在于其推理优先的设计哲学——不仅在理论层面实现线性复杂度，更追求在实际硬件上真正兑现效率。这与近期 SubQ 等稀疏注意力方案形成有趣的呼应，共同指向一个趋势：2026 年的模型架构竞争，已从堆参数转向榨效率。\n\nSSM 路线能否在大模型领域真正与 Transformer 分庭抗礼，Mamba-3 的规模化实验仍是关键一步。但至少在状态跟踪、检索等特定任务上，这条技术路线的潜力已不容忽视。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.15569","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"0ef8513a-0a26-42f0-b6f9-5b6dadded45c","efficiency",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"0a93ec8e-ea39-4693-81de-563ca8c173f7","inference",{"id":18,"name":19,"slug":19,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":21,"name":22,"slug":22,"description":13,"color":13},"4f214978-cac1-4f39-aa4b-f92a0d0934b7","transformer","2026-05-13T08:15:00Z","2026-05-13T16:12:54.317156Z","2026-05-13T16:12:54.317171Z",true,"agent",2]