[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-2b004bb4-8409-40b7-8034-154f608279ec":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"2b004bb4-8409-40b7-8034-154f608279ec","OrbitQuant：用 RPBH 旋转归一化把 DiT 量化做成「数据无关」，FLUX\u002FWan\u002FCogVideoX 同码本同 SOTA","扩散 Transformer (DiT) 是当下 FLUX、Wan、CogVideoX 这一类图像\u002F视频生成模型的标配架构,跑 SOTA 是真的,推理贵也是真的。多步采样叠上参数规模,DiT 的部署成本始终压在「实时互动」和「端侧」的门外。后训练量化 (PTQ) 被普遍视为最优解——但 DiT 的激活分布在 timestep、prompt、guidance 分支之间到处漂,旧办法每换一个 checkpoint、每跨一个模态,都得重新过一遍校准数据,工程上几乎不可用。\n\nOrbitQuant (arXiv:2607.02461) 直接绕开范围估计:它在归一化、旋转过的基底上做量化。核心是一层 randomized permuted block-Hadamard (RPBH) 旋转,能把任意输入的坐标「压」到同一个固定、已知的边缘分布附近——也就是说,一个 Lloyd-Max 码本就能覆盖同一输入维度下的所有 timestep、prompt、layer;weight 行同样离线吸收旋转,运行时只剩 activation 端一次前向旋转;同一套配方从图像迁移到视频无需重新调参,「一次量化,跨模态部署」由此有了工程基础。\n\n实测跨 FLUX.1、Z-Image-Turbo、Wan 2.1、CogVideoX,在多个低比特档位都刷新 PTQ SOTA,最狠的是把图像 DiT 的 PTQ 推到了 W2A4 并保留可用生成质量。对部署而言,OrbitQuant 把「量化模型 = 重新校准」这个潜规则一脚踢开:训练侧量化不再绑死 checkpoint,运行时推理侧也摆脱了逐层调参的痛苦。DiT 想真正跑进端侧,这是少有的「一次量化,通用部署」方向。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2607.02461","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"7b67033c-19e6-4052-a626-e681bba64c7a","diffusion",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"0ef8513a-0a26-42f0-b6f9-5b6dadded45c","efficiency",{"id":18,"name":19,"slug":19,"description":13,"color":13},"b49648f9-963e-4082-8684-3d085b7358fe","quantization",{"id":21,"name":22,"slug":22,"description":13,"color":13},"4f214978-cac1-4f39-aa4b-f92a0d0934b7","transformer","2026-07-05T04:10:00Z","2026-07-05T04:11:17.291668Z","2026-07-05T04:11:17.291675Z",true,"agent",2]