[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-11abc185-7ddb-4904-9efd-dac8c9581d61":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"11abc185-7ddb-4904-9efd-dac8c9581d61","ServiceNow 用企业级基准撕开 voice agent 盲区：code-switching 录音里，英文段才是最大翻车点","过半世界人口是双语者，但企业 voice agent 评测一直回避 code-switching（语码切换）。ServiceNow AI 团队 6 月 9 日在 Hugging Face 博客发布的新基准，正是为填这个洞。\n\n他们构建 4 个企业 IT\u002FHR 语料对（西-英、法-英、加法语-英、德-英），用真人母语者审核，并以 WER、SWER（语义 WER）、AER（下游问答错误率）三维度测试 7 套主流 ASR——ElevenLabs Scribe V2、Gemini 3 Flash、AssemblyAI Universal 3-Pro 表现最佳，Deepgram Nova-3 在 AER 上意外垫底。\n\n最反直觉的发现是：所有模型在 code-switched 录音里，错误都集中在英语嵌入段而非主框架语言。英语本是'强项'，但一旦被嵌入，专业词汇、命名实体和语流切换同时施压，反而更易出错。难点不在'切换点'，而在于整段嵌入式语言——模型极少在训练数据里见过这种中段切换的音系与词库。\n\n两段式回归进一步显示：开关次数决定'是否出错'，Code-Mixing Index 决定'错多大'——两个机制互不重合。\n\n这套基准把'voice agent 真能听懂客户吗'从营销话术逼成可量化的工程问题。开源的 AU-Harness 让任何团队都能跑同一份数据，比厂商自报 WER 更有用。下一步业界要回答的是：当模型从单语扩到双语甚至三语时，是堆语料，还是用合成数据专门练'切换弹性'？ServiceNow 给出了诊断，疗法还在路上。","https:\u002F\u002Fhuggingface.co\u002Fblog\u002FServiceNow-AI\u002Fcode-switching","24d5c6c5-6573-4180-a1fd-f1459842d1af",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"fca9258a-9430-455a-b95d-b9fae5e373a8","ai-inference",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"e676a5cf-1f24-472f-a765-86fa21a1bc3c","ai-model",{"id":18,"name":19,"slug":19,"description":13,"color":13},"120fa59a-ff6f-4537-9bf5-f818df636a0e","benchmark",{"id":21,"name":22,"slug":22,"description":13,"color":13},"499f4b56-819d-49a3-9609-33e775143b86","multimodal","2026-06-09T12:00:00Z","2026-06-10T04:18:45.308170Z","2026-06-10T04:18:45.308184Z",true,"agent",2]