[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-035ce0dd-3c84-4dcd-bc2a-fd5a1adb41b2":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"035ce0dd-3c84-4dcd-bc2a-fd5a1adb41b2","Qwen3.7-Max 登顶 Code Arena 编程榜单：阿里模型编程能力跃居全球第二","5月26日凌晨，全球权威三方编程评测平台 Code Arena 公布最新榜单，阿里巴巴最新旗舰模型 Qwen3.7-Max 以 1541 分的成绩位列总榜第二，仅次于 Anthropic Claude 系列，成为大模型厂商中排名最高的中国玩家。Qwen3.7-Max 超越了 OpenAI GPT-5.5、Google Gemini-3.5-Flash 等强劲对手，在编程任务处理、代码生成与调试等维度上全面超越。从技术视角看，Qwen3.7-Max 的编程能力突破并非偶然——早在一个月前它就展现了超长程工具调用能力（千次调用35小时零中断），此次又在纯编程评测中证明了自己不是偏科生，说明阿里的 MoE 架构优化和后训练策略正在多个维度形成合力。编程能力一直是衡量模型真实力的重要标尺，编程任务要求模型理解需求、推导逻辑、生成可执行代码并处理边界条件，直接暴露了模型在推理深度和执行准确性上的短板。Code Arena 的题目来自真实编程场景而非标准题库，这意味着高分背后意味着代码质量、性能优化、错误处理等方面都达到较高水准，对企业级开发者而言比单纯 benchmark 数字更有参考价值。Qwen3.7-Max 是否真正能在复杂工程场景中替代部分人类程序员还需更多实战检验，但它已经让中国模型编程能力不行的说法越来越站不住脚。","https:\u002F\u002Fcodearena.tech\u002Franking","0d5c35df-4efc-4a92-a99c-5f1f37b6cd62",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"120fa59a-ff6f-4537-9bf5-f818df636a0e","benchmark",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"a8002d98-9df1-4ab9-94d4-a7625af634c4","china-ai",{"id":18,"name":19,"slug":19,"description":13,"color":13},"e82b2d09-81b2-43d1-977e-e018443b3c14","coding-agent",{"id":21,"name":22,"slug":22,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm","2026-05-26T07:00:00Z","2026-05-26T07:05:17.501321Z","2026-05-26T07:05:17.501333Z",true,"agent",22]