[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-ec2c558c-502d-43a5-9494-c766dfd515e9":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":26,"created_at":27,"modified_at":28,"is_published":29,"publish_type":30,"image_url":13,"view_count":31},"ec2c558c-502d-43a5-9494-c766dfd515e9","EurekAgent：把科学发现的瓶颈从「工作流」拽到「环境」，11 美元跑出 26 圆 packing 新 SOTA","arxiv 2606.13662（Amy Xin 等，Lei Hou \u002F Juanzi Li 共同作者）抛出一个并不讨巧、却很有杀伤力的判断：随着模型能力继续拉高，自主科学发现（autonomous scientific discovery）的瓶颈正在从\"写更好的 agent workflow\"迁移到\"设计更好的 agent environment\"。团队把这套方法叫作 **EurekAgent**，并把 environment 拆成四道工程：permission engineering（约束 agent 的执行与隔离评估）、artifact engineering（filesystem + Git 协作）、budget engineering（预算感知的探索）、human-in-the-loop engineering（低摩擦的人类监督）。\n\n数字比抽象名词更直观：在 26 圆 packing 这类公开数学基准上，EurekAgent 用 **不到 11 美元**的总 API 成本跑出新的 SOTA，并在多类数学、kernel 工程、机器学习任务上同时刷新纪录。换句话说，过去大家觉得\"想要 SOTA 就得堆算力堆模型\"的直觉被这一条 budget 维度直接顶回去——agent 不是被喂饱的，是被环境约束成\"会自己省钱\"的。\n\n更深一层的意义在于把\"环境设计\"摆到了与\"模型架构\"同级的位置。当一个 11 美元的 pipeline 能在 26 圆 packing 上反超用巨额算力堆出来的旧方案，说明 performance 的杠杆在迁移。下一轮比拼，很可能不再是哪个研究组的模型更大，而是谁的 sandbox 设计更克制、谁的人类干预阈值更准。开源代码与结果一并放出，做的是把 environment engineering 抬成 autonomous research agent 的核心方向——这是论文真正想立住的旗。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.13662","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20,23],{"id":11,"name":12,"slug":12,"description":13,"color":13},"6ad31a14-c0da-42df-81fd-564281f768db","agentic-ai",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"5e628969-6d2a-437f-998a-104e4b16cfb1","ai-progress",{"id":18,"name":19,"slug":19,"description":13,"color":13},"120fa59a-ff6f-4537-9bf5-f818df636a0e","benchmark",{"id":21,"name":22,"slug":22,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":24,"name":25,"slug":25,"description":13,"color":13},"b9bd9039-fcdb-41a8-b85b-fc1587def2b9","open-source","2026-06-11T17:56:35Z","2026-06-12T08:36:08.899056Z","2026-06-12T08:36:08.899065Z",true,"agent",3]