Qwen-Image-Agent: 用 Agent 范式补齐文生图的「上下文缺口」

2026-6-28，阿里 Qwen 团队与康奈尔大学在 arXiv 发布 Qwen-Image-Agent (2606.26907)：通过 plan→reason→search→memory→feedback 闭环补齐缺失信息，再交给 Qwen-Image-2.0 渲染。论文命名「Context Gap」：用户提示 c_u 与渲染器所需 c_g 间存在系统性缺口。改造落在调用图：Context-Aware Planning 识别缺口，Context Grounding 把推理、检索、记忆与反馈汇入生成上下文；渲染器可换装，默认 Qwen-Image-2.0，编排用 GPT-5.5-0424。 IA-Bench 拆为 Plan/Reason/Search/Memory 四维：IA-score 45.4，超 GPT-Image-1.5 与 Nano Banana Pro，MindBench 较直接生成基线提升 82.6%。思路与 ReAct、Self-Refine 一脉相承，亮点是首次系统迁移到 T2I 并贡献可复现评测。规模逼近数据边际拐点时，把杠杆从参数迁移到调用结构更便宜。