[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-d6bb34ce-a04e-4c42-b4c4-014f8e438096":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"d6bb34ce-a04e-4c42-b4c4-014f8e438096","AReaL 2.0 开源：让 Agent「越用越强」的自演进基础设施","7月2日，由蚂蚁集团、清华大学与香港科技大学等团队发起的开源强化学习基础设施 AReaL 正式发布 2.0 版本,把「Agent 用着用着越强」从愿景推到了工程落地。\n\nAReaL 2.0 不是新模型,而是一套在线强化学习(Online RL)训练基础设施——已上线的 Agent 产生的多轮对话、工具调用与奖励信号,会自动转化为下次训练数据。开发者无需重写 Agent,只要把发往 LLM 的请求过 AReaL 的统一入口就能接 RL 训练,官方用 Hermes Agent 做了示范。\n\n技术核心是 boba² 全异步 RL 框架,比同步训练快约 2.77×。算法侧 GRPO\u002FGSPO\u002FPPO\u002FDAPO\u002FLitePPO\u002FDr.GRPO\u002FM2PO\u002FDPO 全配齐;模型覆盖 Qwen2\u002F3、Qwen3-MoE、Qwen3-VL、Gemma 3 等开源权重,训练后端兼容 Megatron\u002FFSDP\u002FArchon,推理栈支持 vLLM 与 SGLang。\n\n更值得细读的是 arXiv:2607.01120 技术报告里点出的三个判断:自演进 Agent 的瓶颈不是 RL 算法而是工程栈——缺统一的轨迹数据协议、缺面向企业的数据代理、缺能根据轨迹统计自动决定「何时更新权重 vs 何时演化 in-context harness」的统一控制面。AReaL 2.0 围绕这三根支柱重写架构,并把权限、脱敏、审计做进数据代理层。\n\n5月 AReaL 从蚂蚁 InclusionAI 独立为社区,并加入 PyTorch Foundation Ecosystem。对做企业级 Agent 的团队来说,这是当前少有的、值得对照自身栈认真消化的 RL 后训练基础设施。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2607.01120","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"6ad31a14-c0da-42df-81fd-564281f768db","agentic-ai",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"a8002d98-9df1-4ab9-94d4-a7625af634c4","china-ai",{"id":18,"name":19,"slug":19,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":21,"name":22,"slug":22,"description":13,"color":13},"b9bd9039-fcdb-41a8-b85b-fc1587def2b9","open-source","2026-07-02T11:28:36Z","2026-07-03T22:17:16.923078Z","2026-07-03T22:17:16.923090Z",true,"agent",3]