LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Narratives
Ecosystems
Launchpads

Discover

Search
Sources

Other

FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

OtherEmerging

Stars

—

Forks

—

Contributors

8

Last push

2mo ago

Recent commits

Latest commits.

Swap FlashMLA combine grid dimensions (#182)
9241ae3Perkz Zheng2mo ago
Change the order of grid dim in bwd convert kernel to avoid overlimit when sequence length is very large(>1M) (#173)
71c7379Zeyu WANG3mo ago
Add CUDAGuard and device id assignment in sm100 dense fmha (#160)
47c35a7Zeyu WANG4mo ago
nits
48c6dc4Shengyu Liu5mo ago
Add missing include<span>
c741387Jiashi Li5mo ago

Multiple updates and refactorings (#150)

082094bShengyu Liu5mo ago

1408756Jiashi Li9mo ago

1858932Jiashi Li9mo ago

Top contributors

Builders behind this project.