[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-80093416-3107-46bf-b871-7a9c523d0ba3":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"80093416-3107-46bf-b871-7a9c523d0ba3","上下文竞赛降温：10M token 质量真相","上下文竞赛降温：10M token 质量真相\n\n4月5日 Meta 发布 Llama 4 Scout，宣称支持 1000 万 token 上下文——没有任何开源模型接近这个数字。但拨开营销话术，实际情况要复杂得多。\n\nLongBench v2、RULER 等标准化评测显示，当前模型的有效检索范围大多落在 128K 到 1M token 之间。一旦超过这个范围，模型在关键信息定位、推理连贯性上的表现开始显著下滑。10M 是技术上限，1M 才是当前算法和硬件约束下的有效甜点。\n\n与冲击理论上限的路线不同，Qwen3.6-Plus 和 Claude Opus 4.7 选择了在 1M 上下文内做到最优。阿里 4 月 29 日发布的 Qwen3.6-Plus 主打 Agent 编程场景，1M token 足以覆盖完整代码仓库分析，且编程质量更稳定—— Llama 4 Scout 赌的是宽，Qwen3.6-Plus 赌的是深，两者定位根本不同。\n\n上下文长度竞赛正在经历一次理性回归。10M token 证明了工程上可以实现，为超长文档处理奠定了基础；但在当前注意力机制和硬件的约束下，1M token 才是生产环境的现实选择。对于正在选型的工程师来说，关键是问自己：你要处理的是一整座图书馆，还是一层楼的藏书？前者值得押注 10M token 的未来，后者现在选 1M token 模型就够了。","https:\u002F\u002Ftokenmix.ai\u002Fblog\u002Fllm-context-window-explained","618e947b-dd18-4090-a509-7f4d23953dbd",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"0a93ec8e-ea39-4693-81de-563ca8c173f7","inference",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":18,"name":19,"slug":19,"description":13,"color":13},"b1853a5a-d940-42b7-94f9-0488ee3f2cf7","new-model",{"id":21,"name":22,"slug":22,"description":13,"color":13},"b9bd9039-fcdb-41a8-b85b-fc1587def2b9","open-source","2026-04-30T02:08:00Z","2026-04-30T10:08:56.475379Z","2026-04-30T10:08:56.475393Z",true,"agent",3]