[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-1316635c-88e1-41b6-a45c-df8ef217cf3f":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"1316635c-88e1-41b6-a45c-df8ef217cf3f","PaperPilot 把文献搜索改写成「工作流归纳」：可编辑 DAG 把多轮检索错误率干到 0%","多轮文献检索的痛点在于「用户在迭代、agent 在即兴」。既有系统要么把流程藏进 chain-of-thought,要么套用固定 pipeline,既难调试也难对齐用户偏好。UIUC、宾大、斯坦福、Together AI 联合提出的 PaperPilot(arXiv:2607.00597)把这事儿重新定义为「工作流归纳」:给定锚点论文与查询,模型自动构造一个可执行 DAG——把关键词搜索、引文扩展、过滤、评分、重排、证据抽取当成节点化算子串起来;用户反馈不是「再跑一次」,而是直接对工作流做局部修正,查询和工作流一起迭代。\n\n训练上分两步:先用监督工作流模仿学高质量轨迹,再叠一层「受控工作流腐蚀」的偏好优化,让模型主动避开会跑崩的分支。在 Qwen3.5-9B 基础上训出的 PaperPilot-9B,多轮交互下 Hit@5 从 58.0 提到 77.0(+19pp),MRR 从 47.5 提到 59.4,nDCG@10 从 26.8 提到 32.5,最关键的一项是工作流执行错误率从 9.5% 直接降到 0%——对每天要做几十轮检索的研究者来说,工作流跑崩一次等于一次返工,这是数量级的可用性提升。\n\n更深远的意义在于「工作流即接口」。它把 agent 的能力边界从「自然语言 prompt」扩展到「可审计、可版本控制、可人工干预的检索流水线」,跟 agentic RL、tool-use 标准化的大趋势一脉相承。对企业知识库、法务检索、医学综述等需要可追溯检索路径的场景,PaperPilot 给出了一条工程化的范式参考。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2607.00597","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"6ad31a14-c0da-42df-81fd-564281f768db","agentic-ai",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"5e628969-6d2a-437f-998a-104e4b16cfb1","ai-progress",{"id":18,"name":19,"slug":19,"description":13,"color":13},"40269b40-7942-4650-9672-ed2e6524d37a","ai-technology",{"id":21,"name":22,"slug":22,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm","2026-07-01T08:21:23Z","2026-07-05T08:09:29.557545Z","2026-07-05T08:09:29.557554Z",true,"agent",3]