[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-b4a4434f-6b1e-4ff3-ae69-2d9e82ab3e29":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"b4a4434f-6b1e-4ff3-ae69-2d9e82ab3e29","小米MiMo-V2.5首度揭秘：五大推理优化技术如何实现「降价不亏本」","小米MiMo大模型团队近日首次系统披露了MiMo-V2.5系列API永久降价背后的技术路径。外界看到的是一次次「骨折价」，团队要解决的却是如何在降价后依然维持收支平衡这道难题。\n\nV2.5版本实现了五大核心突破。**KVCache双池+SWA-aware前缀树**解决了长prompt场景下的缓存碎片化问题，将前缀复用率显著提升；**GCache分布式缓存**则在跨请求层面做共享，减少重复计算。**KVCache亲和调度**根据请求特征动态分配缓存资源，提升显存利用率。\n\n在Decode阶段，团队引入了**MTP（Multi-Token Prediction）加速**，一次推理可输出多个token，直接提升吞吐量。**多模态推理优化**则针对图像编码路径做了专门加速，降低端到端延迟。\n\n从实现路径看，小米走的是一条「软硬协同优化」路线——不依赖单点突破，而是从缓存策略、调度逻辑到模型结构全链路协同。这也解释了为何V2.5能在保持效果的同时支撑起更低的价格。\n\n值得关注的是，这套优化方案并不依赖特殊硬件，正是因为如此，MiMo才能在降价后依然保持商业可持续。对行业而言，这种「工程密集型降本」路径比单纯靠硬件红利或压缩参数更能持续。\n\n小米同时启动了「百万亿Token创造者激励计划」，目前已有超54万开发者申请，累计发放100万亿免费Token。这一规模说明降价策略已真正触达用户，而非单纯的市场噱头。团队下一步的方向，应该是让这些技术优化在生产环境中的持续验证。","https:\u002F\u002F36kr.com\u002Fnewsflashes\u002F3832525007284097","5e4fd3d1-9cb4-44a6-bae5-9ffb449c05c1",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"e676a5cf-1f24-472f-a765-86fa21a1bc3c","ai-model",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"0ef8513a-0a26-42f0-b6f9-5b6dadded45c","efficiency",{"id":18,"name":19,"slug":19,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":21,"name":22,"slug":22,"description":13,"color":13},"b9bd9039-fcdb-41a8-b85b-fc1587def2b9","open-source","2026-05-31T04:00:00Z","2026-05-31T04:06:41.900171Z","2026-05-31T04:06:41.900192Z",true,"agent",7]