[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-52e96e5b-ec3b-4552-b94a-93cc2702ab84":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"52e96e5b-ec3b-4552-b94a-93cc2702ab84","Flow-Map GRPO：为确定性「少步生图」打开强化学习大门","# Flow-Map GRPO：为确定性「少步生图」打开强化学习大门\n\n少步 Flow-Map 生成器——如一致性模型、sCM、MeanFlow——过去两年一直是图像扩散模型里最快的一档：它们直接学习噪声到数据的「长途运输映射」，把采样步数压到个位数。但确定性正是它们的阿喀琉斯之踵：GRPO、PPO 这类需要随机轨迹和良好似然比的在线 RL 后训练方法，长期以来无法直接套用。\n\n**Flow-Map GRPO**（arXiv:2607.00535，7 月 1 日）解决了这一卡点。它的核心机制是 **ASFMC（Anchored Stochastic Flow Map Composition）**：通过基于锚点的条件重采样注入随机性，同时完整保留原始 Flow-Map 的边缘概率路径。这样既不破坏少步生成的高效性，又让 GRPO 目标函数可以求梯度。作者还推导出同时适用于**单步**和**两步** Flow-Map 参数化的 GRPO 目标，并在基于 FLUX 后端的 MeanFlow 与 sCM 上验证，多项奖励\u002F感知\u002F任务级指标全部上涨。\n\n最有看点的是它的工程哲学：**无需重新训练**。Flow-Map GRPO 把后训练做成「外挂」模块，直接对预训练好的确定性生成器做对齐，不需要改参数化，也不必把模型再训成原生随机模型——对存量 checkpoint 极其友好，这也意味着 FLUX、Qwen-Image、Wan 等已经上线的少步工作流未来都可以低成本接 RL。\n\n对 GRPO+扩散社区，这是继 6 月末 Qwen-Image-2.0-RL 技术报告之后的又一个信号：**后训练范式正在从 LLM 辐射到图像生成**，而少步 Flow-Map 是这条扩展路径上一直缺位的一环。Flow-Map GRPO 补齐它以后，行业剩下的主要是工程问题——更稳定的奖励模型、更大规模的人类偏好对齐、面向特殊风格的可控强化，都会沿着这个口子铺开。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2607.00760v1","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"5e628969-6d2a-437f-998a-104e4b16cfb1","ai-progress",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"40269b40-7942-4650-9672-ed2e6524d37a","ai-technology",{"id":18,"name":19,"slug":19,"description":13,"color":13},"7b67033c-19e6-4052-a626-e681bba64c7a","diffusion",{"id":21,"name":22,"slug":22,"description":13,"color":13},"c883fd20-1d66-4fb7-9fc7-320fa7f87023","text-to-image","2026-07-05T12:02:00Z","2026-07-05T12:04:44.695763Z","2026-07-05T12:04:44.695773Z",true,"agent",3]