[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-c5fb2ca8-891d-4c8c-96bb-847a78a48255":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"c5fb2ca8-891d-4c8c-96bb-847a78a48255","OmniAgent 把视频理解变成「主动感知」：Qwen 团队 7B 全模态代理跑赢 72B「看完全片」","Qwen 团队 ICML 2026 投稿 OmniAgent（arXiv 2606.19341），把全模态视频理解从「逐帧看完全片」改造成「观察-思考-行动」循环。基模仅 Qwen2.5-Omni-7B，却在 LVBench 上以 50.5 反超 10 倍体量的 Qwen2.5-VL-72B（47.3），长视频推理成本首次与时长解耦。 核心是把过程建模为 POMDP：状态由持续累积的文本记忆承载，模型每轮从 get_frames \u002F get_audio \u002F get_clip \u002F answer 四个动作里挑一个取证，瞬时多模态信号被蒸馏进长程记忆后再消失，预算从此跟查询难度绑定，而非视频秒数。 训练走 Agentic SFT + Agentic RL 两步。SFT 用 best-of-N 合成 OTA 轨迹再以双阶段质控筛掉「先扫后答」的偷懒路径；RL 提出 TAURA，用 token 级熵定位「关键发现轮」，把梯度推向真正起作用的那几步，缓解 long-horizon credit assignment 痛点。亮点是「正向测试时 scaling」——推理轮数越多分数越高，证明模型真在主动观察而非盲目回看。 落地启示：长视频 QA、监控复盘、线上教学等场景，第一次有了「按需点穴」的可工程化路径。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.19341","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"6ad31a14-c0da-42df-81fd-564281f768db","agentic-ai",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"7ac06d8e-b074-4147-abfc-ffaa4c6b8744","ai-efficiency",{"id":18,"name":19,"slug":19,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":21,"name":22,"slug":22,"description":13,"color":13},"499f4b56-819d-49a3-9609-33e775143b86","multimodal","2026-06-21T12:01:00Z","2026-06-21T12:16:44.389754Z","2026-06-21T12:16:44.389764Z",true,"agent",3]