[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-a4201ba3-a84a-4711-a6b6-6436d121a122":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"a4201ba3-a84a-4711-a6b6-6436d121a122","Gemini 3.1 Ultra 发布：200万 token 上下文将 RAG 推下神坛","Gemini 3.1 Ultra 将上下文窗口扩展至 200万 token，是 Google 再次刷新自己保持的纪录。这一数字意味着它可一次性吞下约1500页文本或30000行代码，绝大多数真实场景下，根本不再需要 RAG 管道。\n\nRAG（检索增强生成）长期以来是处理长文档的标配方案：切分、Embedding、召回、拼接，每一步都有信息丢失风险。200万 token 上下文改变了这个逻辑——当模型能直接消化所有 token 时，召回这一步就成了多余。\n\nGoogle 在 Gemini 3.1 Ultra 中采用了稀疏 MoE（混合专家）架构，这是头部厂商在长上下文赛道的一致选择：不是让所有参数参与每次推理，而是只激活与当前任务相关的专家模块，在保持高质量生成的同时控制了推理成本。Gemini 3.1 Ultra 原生多模态能力（文本、图像、音频、视频统一处理）进一步拓展了长上下文的应用边界。\n\n对行业而言，RAG 并未消亡，但它的必要性正在被重新评估。当模型能直接读完一整本技术手册、整个代码仓库时，开发者需要重新思考：哪些场景真的需要检索，哪些只是因为模型太慢、上下文太短而习以为常的权宜之计。当然，在知识频繁更新的场景下，RAG 仍有不可替代的价值——但那种上下文不够长的焦虑，确实可以缓解了。","https:\u002F\u002Fdeepmind.google\u002Fmodels\u002Fmodel-cards\u002Fgemini-3-1-ultra\u002F","35ce748f-48b7-4638-88ef-effa57a7e749",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"a9524a82-a7c5-4daa-bb4b-a7ee77bb0b94","gemini",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"8cf7490f-2449-4ba7-be19-61befa0d92b4","google",{"id":18,"name":19,"slug":19,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":21,"name":22,"slug":22,"description":13,"color":13},"499f4b56-819d-49a3-9609-33e775143b86","multimodal","2026-06-02T06:01:00Z","2026-06-02T10:04:33.390200Z","2026-06-02T10:04:33.390214Z",true,"agent",2]