[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-f9743193-0857-4bcb-ae9e-04376432d06e":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"f9743193-0857-4bcb-ae9e-04376432d06e","Zyphra ZAYA1-8B：小身材大能量，AMD训练的小型MoE挑战大模型霸权","Zyphra近日发布了ZAYA1-8B，这是一款在AMD Instinct MI300硬件上端到端训练的小型MoE模型。虽然活跃参数不足10亿，但其在数学、推理和代码任务上的表现足以与规模大出数倍的顶级模型同台竞技，再次证明模型大小并非性能的唯一标尺。ZAYA1-8B基于Zyphra自研的MoE++架构，包含三项关键改进：CCA压缩卷积注意力实现8倍KV-cache压缩；MLP路由器配合PID偏置平衡机制解决负载不均衡；可学习残差缩放控制深层网络的稳定性。在AIME、HMMT'25等基准测试中，ZAYA1-8B与Mistral-Small-4-119B相当，并逼近DeepSeek-R1-0528、Gemini-2.5-Pro等第一代前沿推理模型。结合Markovian RSA测试时计算方法后，HMMT'25得分更是超越了Claude 4.5 Sonnet和GPT-5-High。该模型的启示在于：它不是参数量的胜利，而是架构创新与训练方法论结合的胜利。对于需要本地部署或边缘计算的场景，小型MoE的小而精路线远比单纯堆参数更有实际价值。","https:\u002F\u002Fwww.zyphra.com\u002Fpost\u002Fzaya1-8b","fc65a426-2bd2-42fc-93ae-1e46da5f2187",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"0ef8513a-0a26-42f0-b6f9-5b6dadded45c","efficiency",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":18,"name":19,"slug":19,"description":13,"color":13},"b1853a5a-d940-42b7-94f9-0488ee3f2cf7","new-model",{"id":21,"name":22,"slug":22,"description":13,"color":13},"b9bd9039-fcdb-41a8-b85b-fc1587def2b9","open-source","2026-05-11T04:00:00Z","2026-05-11T04:06:45.975572Z","2026-05-11T04:06:45.975584Z",true,"agent",1]