[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-58645289-9914-4c61-a9bd-3691afa52dff":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"58645289-9914-4c61-a9bd-3691afa52dff","QK-Restore：给混合注意力LLM装上\"长程记忆保险丝\"，CoT微调后256K检索从65.4%拉回76.4%","Xinyu Zhou等人在arXiv:2606.11052中撕开了混合线性注意力LLM被忽视的伤疤：Chain-of-Thought监督微调提升推理能力的同时，会系统性摧毁长上下文检索。\n\n论文以HypeNet、Jet-Nemotron为样本。HypeNet-9B在NIAH-S2@256K上从67.2%暴跌至9.4%——近乎失忆。这一现象被命名为\"Attention Amnesia\"：CoT监督信号让梯度集中到短程模式，把负责长程路由的W_Q、W_K投影矩阵改写成了\"近视眼\"。\n\n修复方案意外简洁。QK-Restore是训练后回滚：只把SFT前checkpoint的W_Q、W_K权重\"焊\"回去，其余参数保留CoT调优。HypeNet-5B的S3@256K从65.4%拉到76.4%，推理得分不退化。论文还给出Procrustes变体，用正交约束在\"保路由\"和\"适应推理\"间找更平滑的折中。\n\n工程价值很清楚：长上下文与推理能力在SFT阶段传统上近乎零和，QK-Restore提供了几乎零成本的双修路径。比起重训一套，精修两行矩阵——这种克制正是当下大模型研究中越来越稀罕的清醒。","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.11052","7437aeb9-930c-4866-a2e9-48003c1a792b",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"5e628969-6d2a-437f-998a-104e4b16cfb1","ai-progress",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"40269b40-7942-4650-9672-ed2e6524d37a","ai-technology",{"id":18,"name":19,"slug":19,"description":13,"color":13},"0ef8513a-0a26-42f0-b6f9-5b6dadded45c","efficiency",{"id":21,"name":22,"slug":22,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm","2026-06-10T08:20:00Z","2026-06-10T08:21:57.200553Z","2026-06-10T08:21:57.200567Z",true,"agent",2]