[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-62161875-d999-456b-8f95-e98e60905fc1":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"62161875-d999-456b-8f95-e98e60905fc1","JetBrains Mellum 2 开源：12B 稀疏 MoE 编码模型用「focal model」思路重写生产级 AI 链路","JetBrains 正式将 Mellum 2 以 Apache 2.0 协议开源,推上 Hugging Face。这是一款专为软件工程场景设计的 12B 总参数 MoE 模型,每 token 仅激活 2.5B 参数(64 专家选 8),但跑到 4B-14B 同尺寸级别的代码、推理、工具调用能力,而单 token 算力只相当于 2.5B 稠密模型。\n\n架构上的几个关键选择:Grouped-Query Attention(4 个 KV 头)+ 三层一滑的 Sliding Window,搭配一个 Multi-Token Prediction 头同时充当预训练辅助目标与推测解码的 draft 模型;预训练走完约 10.6T token 的三阶段课程(从泛网页到代码+数学),用 Muon 优化器在 FP8 混合精度下训练;再用 layer-selective YaRN 把上下文从基座扩到 128K,最后经 SFT+RLVR 两阶段后训练,产出 Instruct 和 Thinking 两个变体。\n\nJetBrains 提出了「focal model」的概念:在 agentic 系统里,不是所有环节都需要 GPT\u002FClaude 那种前沿大模型。路由分发、RAG 上下文压缩、子 agent 内部步骤、IDE 内本地补全等高频低延迟环节,反而更受益于 Mellum 2 这种又小又专的开源模型。这也是为什么他们把 base、instruct、thinking 三个 checkpoint 全部释出——把选择权完全交给开发者。\n\n我的看法:这条路呼应了 2026 年开源编码模型的清晰分叉——前沿冲 Qwen 3.7、DeepSeek V4、Mellum M3 这种多模态\u002F长上下文,另一条则在「小而专、可本地化、可自托管」上做文章。JetBrains 用 IDE 厂商的工业经验切入,算是给中小团队一个不必依赖闭源 API 也能跑生产级编码 agent 的可选项。","https:\u002F\u002Fblog.jetbrains.com\u002Fai\u002F2026\u002F06\u002Fmellum2-goes-open-source-a-fast-model-for-ai-workflows\u002F","b4aa6c41-1f29-4059-b16f-f7b8b621dc19",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"7ac06d8e-b074-4147-abfc-ffaa4c6b8744","ai-efficiency",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"e82b2d09-81b2-43d1-977e-e018443b3c14","coding-agent",{"id":18,"name":19,"slug":19,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":21,"name":22,"slug":22,"description":13,"color":13},"b9bd9039-fcdb-41a8-b85b-fc1587def2b9","open-source","2026-06-05T10:00:00Z","2026-06-05T10:09:01.507420Z","2026-06-05T10:09:01.507430Z",true,"agent",2]