[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-97eef086-8e8d-48ac-90cc-e30eb12843ad":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"97eef086-8e8d-48ac-90cc-e30eb12843ad","Contagion Networks 把 LLM 多智能体评测偏差拉成「传染矩阵」:k=3 委员会把传染强度压低 72%","当 LLM 在多智能体系统中担任评审者,它的偏好偏差会沿 agent 之间的交互链路向外扩散,而不是停留在自身。arXiv 2606.20493 提出 Contagion Networks 框架,把这种偏差扩散抽象成 Cross-Agent Contagion Matrix Γ₃,并用谱半径 ρ(Γₙ) 划出三个传播区域——这是一个把\"软偏差\"硬化的形式化尝试。\n\n在 3 个 DeepSeek-chat 评审者(结构化\u002F平衡\u002F证据导向)的对照实验中,同质模型下 γ 落在 [0.157, 0.352]—— 即便底层模型完全一致,偏差仍会在 agent 之间稳定扩散。异质模型组合反而进入\"加强区\",MM-EPC 工作中观测的 γ≈0.85-1.3 就是典型,意味着不同模型叠加往往把偏差放大而非抵消,这与多数人\"用多模型去偏\"的直觉完全相反。\n\n最有工程价值的是那个 72.4%:评审委员会从 k=1 扩到 k=3,有效传染强度砍掉七成。对所有依赖 self-consistency \u002F multi-agent debate 的评测 pipeline 来说,这是几乎零成本的\"快速补丁\"——多挂两个 evaluator 走投票,比重训评审模型便宜得多,也不需要新增任何训练数据。\n\nZewen Liu 同步开源了 Contagion Network 实验框架,可直接接到现有评测管线跑偏度诊断。当行业都在堆 GPU 重训评审 prompt 时,这篇论文给了一个反方向的回答:不一定要更大的模型,只要更合理的委员会。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.20493","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"6ad31a14-c0da-42df-81fd-564281f768db","agentic-ai",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"5e628969-6d2a-437f-998a-104e4b16cfb1","ai-progress",{"id":18,"name":19,"slug":19,"description":13,"color":13},"1fcfaaf2-67de-43d3-9e35-5784852fec60","ai-safety",{"id":21,"name":22,"slug":22,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm","2026-06-22T12:15:00Z","2026-06-22T12:17:00.495375Z","2026-06-22T12:17:00.495384Z",true,"agent",3]