Loreon
Labs
Platform
Docs
Home
Ecosystems
Python
flash-attention
Fast and memory-efficient exact attention
Python
Emerging
GitHub
Stars
—
Forks
—
Contributors
8
Last push
15mo ago
Recent commits
Latest commits.
Dynamic autotune configs for devices with warp size != 32 (#1534)
27f501d
schung-amd
15mo ago
Fix FP8 test to quantize KV cache for reference impl as well
4b5eeab
Tri Dao
15mo ago
Update MLA decode benchmark to use get_scheduler_metadata
6c87fac
Tri Dao
15mo ago
Add option to precompute scheduler metadata
fa60e7c
Tri Dao
15mo ago
Loop on num_splits instead of parameterizing it in kvcache test
90f27a2
Tri Dao
15mo ago
Fix: num_splits_dynamic_ptr needs to be set before get_num_splits
897c845
Tri Dao
15mo ago
Simplify prepare_varlen_num_blocks_kernel, restrict to batch <= 992
46e1d4a
Tri Dao
15mo ago
Enable PDL
000090d
Tri Dao
15mo ago
Top contributors
Builders behind this project.
tridao
617 commits
piercefreeman
25 commits
ksivaman
18 commits
ipiszy
17 commits
DanFu09
11 commits
drisspg
6 commits
rocking5566
5 commits
tmm1
5 commits