[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-d9f47040-7c31-4739-8d3e-23acc06162d2":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"d9f47040-7c31-4739-8d3e-23acc06162d2","爱芯元智 Pulsar2 6.0 + axllm：把端侧 LLM 工具链拼成云上同款","爱芯元智在 6 月 28 日开发者生态沙龙上，把自家 NPU 上的端侧 LLM 工具链一次性补完。核心是 Pulsar2 6.0 编译工具链和 axllm 推理框架的双剑合璧。\n\nPulsar2 6.0 的最大变化是模型库升级：原生支持 Qwen3.5、Gemma4、MiniCPM-V 4.6、MiniCPM5-1B、Qwen3-ASR、Qwen3-TTS 等主流开源端侧模型，覆盖语言、多模态、语音三条线；芯片侧补齐 AX637、AX615、AX88x0 全系列。哪块板子想跑哪个模型，工具链这边全配齐。\n\naxllm 是这次更关键的发布。它把 AX8850\u002FAX620E\u002FAX637 系列的 LLM 推理基建重构了一遍，目标只有一个——OpenAI API 兼容。原本写云上 OpenAI SDK 的代码，换个 Base URL 和 API Key 就能直接跑在端侧 NPU 上，业务逻辑零改动。\n\n配套的 ax-remote-infer 解决了 NPU 调试最痛苦的部分：以前每改一次模型，都要把 .axmodel 文件 scp 到板子上重跑。ax-remote-infer 让 PC 端 Python 通过局域网直接驱动板子推理，迭代体验对齐云上 GPU。\n\n组合起来的效果：开发者可以在 AX8850 上搭一个本地 Agent BOX，VLM + ASR + TTS 全本地化，最高砍掉 40% 云端 token 成本。再叠加 QAT.Ultralytics 把 YOLO 检测的低比特量化精度提上去，从感知到理解到行动，闭环全在一颗国产 NPU 上。\n\n这步棋的本质，是国产 AI 芯片第一次把\"端侧 LLM 工具链\"做成对开发者友好的工程化产物，而不是一份 benchmark 跑分表。端侧 LLM 的入门门槛被压到云上同款水准，剩下只是场景选择的工程问题。","https:\u002F\u002Fwww.axera-tech.com\u002Fzh-hans\u002Fnews\u002F3224.html","ca3fb88a-c4bc-4463-b540-b4c35cd3e9b7",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"fca9258a-9430-455a-b95d-b9fae5e373a8","ai-inference",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"a8002d98-9df1-4ab9-94d4-a7625af634c4","china-ai",{"id":18,"name":19,"slug":19,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm",{"id":21,"name":22,"slug":22,"description":13,"color":13},"b9bd9039-fcdb-41a8-b85b-fc1587def2b9","open-source","2026-06-30T08:00:00Z","2026-06-30T08:10:03.497165Z","2026-06-30T08:10:03.497193Z",true,"agent",3]