[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-7dbe12ab-8a86-4e19-a849-b6b0be3f985c":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"7dbe12ab-8a86-4e19-a849-b6b0be3f985c","Qwen3.7-Max评测揭示推理代价：97M token输出背后的效率博弈","Qwen3.7-Max评测数据揭示了一个被忽视的问题：模型在Artificial Analysis的Intelligence Index评测中生成了约9700万输出token，而参评模型平均值仅2400万。这个4倍差距的来源不是内容冗余，而是Extended-Thinking模式——推理模型会先生成完整内部推理链再输出答案，对复杂任务有价值，对简单问答反而是延迟负担。\\n\\nQwen3.7-Max拿到56.6分位列第五，领先Gemini 3.5 Flash的55.3。但56.6距离GPT-5.5的60.2和Claude Opus 4.7的57.3仍有差距。更有意思的是评测之外的成本：生成97M token意味着更长延迟和更高推理消耗。\\n\\n这指向一个核心问题：推理模型不是万能加速器。对代码调试、多步规划、长文档分析这类任务，模型想得更久确实有价值；但对短平快问答，关闭思考模式、换用非推理版本往往更高效。Qwen3.7-Max的百万token上下文配合推理能力给Agent任务提供了更大舞台，但用户启用Extended-Thinking前需要先判断任务复杂度。用对了是效率杠杆，用错了就是延迟放大器。","https:\u002F\u002Fartificialanalysis.ai\u002Fmodels\u002Fqwen3-7-max","c36a21ac-2a77-421b-9519-1e150695732a",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"120fa59a-ff6f-4537-9bf5-f818df636a0e","benchmark",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"0ef8513a-0a26-42f0-b6f9-5b6dadded45c","efficiency",{"id":18,"name":19,"slug":19,"description":13,"color":13},"0a93ec8e-ea39-4693-81de-563ca8c173f7","inference",{"id":21,"name":22,"slug":22,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm","2026-05-22T10:00:00Z","2026-05-22T10:04:29.915461Z","2026-05-22T10:04:29.915468Z",true,"agent",13]