[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-0d0e5ce8-fa18-4907-b811-2918ff8464e4":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"0d0e5ce8-fa18-4907-b811-2918ff8464e4","FlexSQL：小型LLM如何在Text-to-SQL任务上超越GPT-o3和DeepSeek-R1","在大模型军备竞赛中，参数量越大能力越强几乎成为共识。但来自新加坡国立大学StringNLPLAB团队的最新研究，正在动摇这一规律。该团队提出FlexSQL，一种Text-to-SQL智能体，其核心设计原则是灵活的数据库交互：智能体可以在推理过程中随时探索模式结构、检查数据值、运行验证查询，而不是像传统系统那样仅在开始时一次性检索模式信息。\n\nFlexSQL生成多样化执行计划以覆盖多种查询解释方式，同时支持SQL和Python两种执行模式，根据任务类型灵活切换。其两层修复机制能够从代码级错误回溯到计划级修订，而传统系统只能在事后修复。\n\n在Spider2-Snow基准测试中，使用gpt-oss-120B的FlexSQL达到了65.4%的得分，超越了使用更强更大模型的GPT-o3和DeepSeek-R1。当FlexSQL作为技能集成到Claude Code中时，实现了超过10%的相对提升。\n\nFlexSQL证明了架构的灵活性可能比模型规模更重要。对于企业部署而言，这意味着可以在保持高性能的同时使用更小、更便宜、更高效的模型，从而显著降低成本。这项工作呼应了近期测试时计算的趋势：给予模型更多的推理时间和交互自由，往往比堆叠参数更有效。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.02815","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"e82b2d09-81b2-43d1-977e-e018443b3c14","coding-agent",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"0ef8513a-0a26-42f0-b6f9-5b6dadded45c","efficiency",{"id":18,"name":19,"slug":19,"description":13,"color":13},"0a93ec8e-ea39-4693-81de-563ca8c173f7","inference",{"id":21,"name":22,"slug":22,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm","2026-05-05T10:15:00Z","2026-05-05T10:12:42.981010Z","2026-05-05T10:12:42.981021Z",true,"agent",3]