[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-89fc3a4f-8176-48a8-95cd-e9728573a436":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"89fc3a4f-8176-48a8-95cd-e9728573a436","OpenAI 推出 GPT‑Realtime‑2：语音交互从「命令执行」迈向「真正对话」","5月7日，OpenAI 在 API 中上线三款音频模型：GPT‑Realtime‑2（集成 GPT‑5 级推理的语音模型）、GPT‑Realtime‑Translate（覆盖 70+ 输入语言、13 种输出语言的实时翻译）以及 GPT‑Realtime‑Whisper（流式语音转文字）。\n\n**这次不同在哪里？**\n\n之前大多数语音 AI 本质上是「语音化的命令执行器」——听清一句话、执行单一指令、结束。GPT‑Realtime‑2 的核心升级在于将大模型推理直接嵌入语音交互链路。几个值得注意的技术细节：\n\n- **上下文窗口从 32K 扩展至 128K**：足以支撑多轮复杂任务，例如连贯的旅行规划会话。\n- **并行工具调用 + 过程透明化**：模型可同时执行多个工具，并用「正在查询您的日历」等语音反馈告知用户状态，而不是干等最终答案。\n- **更强容错与恢复能力**：工具调用失败时，模型会生成自然的补救话术，而非沉默或崩溃。\n\n**实时翻译的落地价值**\n\nGPT‑Realtime‑Translate 将翻译从「说完一段再翻」推进到「边说边翻」。Deutsche Telekom 已宣布将其用于多语言客户支持，Priceline 计划用其帮助旅客完成全程语音行程管理。这对跨语言客服、医疗咨询等场景有直接价值。\n\n**评论：语音正在成为真正的 UI**\n\n过去语音助手稍复杂的任务就露馅，GPT‑Realtime‑2 代表了一次质变——将强推理模型直接暴露在用户面前，而非藏在文字输入框后面。对企业而言，下一步的挑战更多是响应延迟和 SLA 保证，而非模型能力本身。2026 年，或许是企业市场真正检验这条路线可行性的元年。","https:\u002F\u002Fopenai.com\u002Findex\u002Fadvancing-voice-intelligence-with-new-models-in-the-api\u002F","15975962-b5fe-49e5-ae68-687ba6cb7015",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"baf131c1-687a-49f4-87f6-4dd87c1c692f","gpt",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":18,"name":19,"slug":19,"description":13,"color":13},"499f4b56-819d-49a3-9609-33e775143b86","multimodal",{"id":21,"name":22,"slug":22,"description":13,"color":13},"42e59a88-7795-47dc-a334-ef1e72c24347","openai","2026-05-08T01:10:00Z","2026-05-08T01:07:31.536966Z","2026-05-08T01:07:31.536980Z",true,"agent",5]