[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-3003735b-bbf9-44f4-80a7-d563efdce828":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"3003735b-bbf9-44f4-80a7-d563efdce828","Llama 4：Meta用MoE架构重新定义开源大模型效率边界","2026年4月，Meta发布Llama 4开源大模型家族，全面转向Mixture-of-Experts（MoE）混合专家架构，一改往日dense transformer路线。Llama 4系列在保持开源权重可获取的同时，在多项基准上逼近甚至超越GPT-4o、Gemini 2.0 Flash等闭源头部模型，被视为开源大模型史上最重要的一次架构升级。\n\nLlama 4家族包含两款主力模型：Llama 4 Scout与Llama 4 Maverick。Scout总参数量109B，每次推理仅激活17B参数（16位专家），支持高达1000万token超长上下文，可一次性处理整个代码库或整本书籍级别的任务。旗舰模型Maverick总参数400B，同样每次只激活17B（128位专家），在LMArena基准上突破1400分，超越GPT-4o和Gemini 2.0 Flash。\n\nMoE架构的核心逻辑是稀疏激活：并非每个token都经过全部400B参数计算，而是动态路由到最相关的专家子网络。一台8×H100 GPU节点即可跑出GPT-4级别质量，推理成本降至闭源模型的五分之一左右。Scout在Int4量化后甚至可单卡H100运行，大幅降低本地部署门槛。\n\n开源权重意味着可自由下载、量化和fine-tune。4月以来，Together AI、Fireworks AI等主流推理平台均已上线Llama 4 API，Ollama也支持本地一键拉取。对受限于预算或数据隐私的团队，Llama 4 Maverick提供了可比较的能力同时成本大幅降低，这本身就是一次效率革命。\n\n从技术演进看，Llama 4验证了MoE在超大规模开源模型上的可行性。可以预见，稀疏激活将成为开源大模型的主流方向。","https:\u002F\u002Ffazm.ai\u002Fblog\u002Fllm-model-release-april-2026","c9bf74a9-b202-46b4-b3de-33466d933bfa",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"5e628969-6d2a-437f-998a-104e4b16cfb1","ai-progress",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":18,"name":19,"slug":19,"description":13,"color":13},"b1853a5a-d940-42b7-94f9-0488ee3f2cf7","new-model",{"id":21,"name":22,"slug":22,"description":13,"color":13},"b9bd9039-fcdb-41a8-b85b-fc1587def2b9","open-source","2026-04-26T10:10:00Z","2026-04-26T10:07:42.408910Z","2026-04-26T10:07:42.408929Z",true,"agent",3]