<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Qwen 团队开源 FlashQLA 线性注意力内核，对比 FLA Triton 实现 2-3× 前向、2× 反向加速]]></title><description><![CDATA[<p dir="auto">阿里 Qwen 团队开源高性能线性注意力内核库 FlashQLA，基于 TileLang 构建，针对 Gated DeltaNet（GDN）Chunked Prefill 的前向与反向算子做了深度融合与优化。在 NVIDIA Hopper 架构（SM90 及以上）多个场景下，相较此前主流的 FLA Triton 内核实现 2-3 倍前向加速、2 倍反向加速，预训练与端侧 Agent 推理场景下加速比尤为明显。基准测试覆盖 Qwen3.5 / Qwen3.6 系列实际使用的 head 配置（h_k,v ∈ {64, 48, 32, 24, 16, 8}，对应 TP1 至 TP8），对比基线为 FLA 0.5.0、Triton 3.5.1、FlashInfer 0.6.9 与 TileLang 0.1.8。</p>
<p dir="auto">技术上 FlashQLA 主打三项优化：一是利用 GDN gate 的指数衰减特性，在 TP、长序列、小 head 数等场景下自动开启卡内 Context Parallel（intra-card CP），提升 GPU SM 利用率；二是对前向与反向做硬件友好的代数重写，在不损失数值精度的前提下显著降低 Tensor Core、CUDA Core 与 SFU 开销；三是采用 TileLang 构建多个融合 warp-specialized 内核，手动实现 warpgroup 特化以重叠数据搬运、Tensor Core 与 CUDA Core 计算——既不像传统实现那样拆分为多个独立 kernel，也不强求把整个流程压进单一 kernel。要求 SM90 及以上、CUDA 12.8、PyTorch 2.8，已采用 MIT 许可证开源。仓库目前 49 star、2 fork。</p>
<p dir="auto"><a href="https://github.com/QwenLM/FlashQLA" target="_blank" rel="noopener noreferrer nofollow ugc">GitHub - QwenLM/FlashQLA</a> | <a href="https://qwen.ai/blog?id=flashqla" target="_blank" rel="noopener noreferrer nofollow ugc">Qwen Blog</a></p>
<p dir="auto"></p><div class="card col-md-9 col-lg-6 position-relative link-preview p-0">



<a href="https://github.com/QwenLM/FlashQLA" title="GitHub - QwenLM/FlashQLA: high-performance linear attention kernel library built on TileLang">
<img src="https://opengraph.githubassets.com/0e70a2cae421137c058e16d4911925983b507f807197657fe1beda6992da2c45/QwenLM/FlashQLA" class="card-img-top not-responsive" style="max-height:15rem" alt="Link Preview Image" />
</a>



<div class="card-body">
<h5 class="card-title">
<a class="text-decoration-none" href="https://github.com/QwenLM/FlashQLA">
GitHub - QwenLM/FlashQLA: high-performance linear attention kernel library built on TileLang
</a>
</h5>
<p class="card-text line-clamp-3">high-performance linear attention kernel library built on TileLang - QwenLM/FlashQLA</p>
</div>
<a href="https://github.com/QwenLM/FlashQLA" class="card-footer text-body-secondary small d-flex gap-2 align-items-center lh-2">



<img src="https://github.githubassets.com/favicons/favicon.svg" alt="favicon" class="not-responsive overflow-hiddden" style="max-width:21px;max-height:21px" />



<p class="d-inline-block text-truncate mb-0">GitHub <span class="text-secondary">(github.com)</span></p>
</a>
</div><p></p>
<p dir="auto"></p><div class="card col-md-9 col-lg-6 position-relative link-preview p-0">



<a href="https://qwen.ai/blog?id=flashqla" title="Qwen Studio">
<img src="https://img.alicdn.com/imgextra/i1/O1CN013ltlI61OTOnTStXfj_!!6000000001706-55-tps-330-327.svg" class="card-img-top not-responsive" style="max-height:15rem" alt="Link Preview Image" />
</a>



<div class="card-body">
<h5 class="card-title">
<a class="text-decoration-none" href="https://qwen.ai/blog?id=flashqla">
Qwen Studio
</a>
</h5>
<p class="card-text line-clamp-3">Qwen Studio offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.</p>
</div>
<a href="https://qwen.ai/blog?id=flashqla" class="card-footer text-body-secondary small d-flex gap-2 align-items-center lh-2">



<img src="https://g.alicdn.com/qwenweb/qwen-ai-fe/0.0.4/favicon.ico" alt="favicon" class="not-responsive overflow-hiddden" style="max-width:21px;max-height:21px" />



<p class="d-inline-block text-truncate mb-0"> <span class="text-secondary">(qwen.ai)</span></p>
</a>
</div><p></p>
]]></description><link>https://welinux.com//topic/88/qwen-团队开源-flashqla-线性注意力内核-对比-fla-triton-实现-2-3-前向-2-反向加速</link><generator>RSS for Node</generator><lastBuildDate>Sat, 02 May 2026 21:04:35 GMT</lastBuildDate><atom:link href="https://welinux.com//topic/88.rss" rel="self" type="application/rss+xml"/><pubDate>Tue, 28 Apr 2026 15:41:48 GMT</pubDate><ttl>60</ttl></channel></rss>