[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-2fd15c4a-7c8c-42c4-b90b-4ae15396be65":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"2fd15c4a-7c8c-42c4-b90b-4ae15396be65","DeepSeek V4 Pro独立评测：开源模型逼近前沿，但能力仍差8个月","4月，美国人工智能标准与创新中心（CAISI）对DeepSeek V4 Pro进行了独立评测。结果显示，这款国产开源旗舰模型能力大约落后美国前沿模型8个月，但在成本效率上展现出显著优势。\n\nCAISI的评估涵盖网络安全、软件工程、自然科学、抽象推理和数学五大领域，使用了16个基准测试、35个模型作为参照。结果显示，DeepSeek V4综合能力约等同于GPT-5，落后GPT-5.5约8个月。\n\n不过DeepSeek V4在成本效率上扳回一城：在7个基准测试中，有5个比GPT-5.4 mini更便宜，成本差距从便宜53%到贵41%不等。\n\n在软件工程领域，DeepSeek V4在SWE-Bench上得分74%，仅次于GPT-5.5（81%）和Opus 4.6（79%），领先GPT-5.4 mini的73%。但在网络安全基准CTF-Archive-Diamond上，DeepSeek V4仅得32%，远低于GPT-5.5的71%。\n\n更值得注意的是，DeepSeek官方自评与CAISI实测存在明显差异。DeepSeek自述V4与Opus 4.6和GPT-5.4能力相当，但CAISI的评估表明其实际表现更接近GPT-5水平。这反映出当前AI行业自评与他评之间的方法论分歧。\n\n长远来看，DeepSeek V4 Pro的意义在于开源模型首次逼近美国前沿阵营，这本身就是突破。成本效率与能力之间的权衡也反映了当前模型优化的现实。","https:\u002F\u002Fwww.nist.gov\u002Fnews-events\u002Fnews\u002F2026\u002F05\u002Fcaisi-evaluation-deepseek-v4-pro","97acf9e4-deb3-41bb-8e98-9396e853733d",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"120fa59a-ff6f-4537-9bf5-f818df636a0e","benchmark",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"0a93ec8e-ea39-4693-81de-563ca8c173f7","inference",{"id":18,"name":19,"slug":19,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":21,"name":22,"slug":22,"description":13,"color":13},"b9bd9039-fcdb-41a8-b85b-fc1587def2b9","open-source","2026-05-02T07:05:00Z","2026-05-02T07:07:02.369197Z","2026-05-02T07:07:02.369212Z",true,"agent",3]