[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"news-748a4486-34e7-4215-b515-7eb56b3258c5":3},{"id":4,"title":5,"summary":6,"original_url":7,"source_id":8,"tags":9,"published_at":23,"created_at":24,"modified_at":25,"is_published":26,"publish_type":27,"image_url":13,"view_count":28},"748a4486-34e7-4215-b515-7eb56b3258c5","TwELL：Sakana AI与NVIDIA联合提出稀疏LLM推理加速20%，解决GPU批处理落地难题","现代大语言模型的前馈层占据了超过三分之二的模型参数和80%以上的总FLOPs，而推理时对任意给定token，超过99%的隐藏激活值可以为零。这种天然的激活稀疏性本应带来巨大效率提升，但GPU高度优化的稠密矩阵运算（Tensor Core）无法有效利用——稀疏操作的额外转换开销往往抵消了跳过零值带来的收益。\n\n之前的稀疏LLM内核（TurboSparse、ProSparse、Q-Sparse等）只瞄准了单token GEMV场景，但实际训练和高吞吐推理处理的都是大批量token的GEMM运算，稠密基准在现代GPU上通过大tile和Tensor Core实现数量级更高的FLOP\u002Fs，稀疏开销反而更大。\n\nSakana AI与NVIDIA联合提出TwELL（Tile-wise ELL）稀疏格式，核心创新在于：将列划分为与matmul kernel tile大小匹配的水平块，在块内局部打包非零值——而非传统ELL格式的按行全局打包。TwELL可在现有gate projection kernel的epilogue中直接构造，无需额外kernel启动、额外全局内存读写或同步开销。推理阶段，融合kernel联合执行up projection和down projection，中间隐藏状态从不写回全局内存，每一次前向传播都减少了DRAM流量。\n\n使用TwELL内核的稀疏LLM在H100 GPU上实现了推理前向传播加速20.5%、训练加速21.9%，同时降低能耗和内存占用。实现方式极为简单：只需将SiLU激活函数替换为ReLU，并在隐藏前馈激活上添加L1正则项（系数2×10⁻⁵）。在1.5B模型上，ReLU精度略低于SiLU（46.4% vs 47.1%），但被效率收益完全覆盖。稀疏性在大约1,000步（~1B tokens）内快速稳定。\n\nTwELL的价值在于真正解决了稀疏性从研究走向生产的难题——从单token GEMV走向batch GEMM。对整个行业而言，这是一个方向性验证：超过99%的激活为零，TwELL首次让它在批处理场景中兑现为真实的加速。20%以上的推理加速意味着相同硬件可服务更多用户，或用更少GPU完成相同吞吐量。论文已发表于ICML 2026，代码已开源。","https:\u002F\u002Fwww.marktechpost.com\u002F2026\u002F05\u002F11\u002Fsakana-ai-and-nvidia-introduce-twell-with-cuda-kernels-for-20-5-inference-and-21-9-training-speedup-in-llms\u002F","8382d60c-c2c4-49c5-9638-8518b803f88f",[10,14,17,20],{"id":11,"name":12,"slug":12,"description":13,"color":13},"7ac06d8e-b074-4147-abfc-ffaa4c6b8744","ai-efficiency",null,{"id":15,"name":16,"slug":16,"description":13,"color":13},"0ef8513a-0a26-42f0-b6f9-5b6dadded45c","efficiency",{"id":18,"name":19,"slug":19,"description":13,"color":13},"0a93ec8e-ea39-4693-81de-563ca8c173f7","inference",{"id":21,"name":22,"slug":22,"description":13,"color":13},"01598627-1ea6-4b27-a5d8-874971571a71","llm","2026-05-30T08:20:00Z","2026-05-30T16:15:55.955068Z","2026-05-30T16:15:55.955078Z",true,"agent",8]