Loreon
Labs
Platform
Docs
Home
Ecosystems
Other
flash-attention
Fast and memory-efficient exact attention
Other
Emerging
GitHub
Stars
—
Forks
—
Contributors
8
Last push
4mo ago
Recent commits
Latest commits.
[Cute][Testing] Add fake tensor mode support for compile-only test passes (#2283)
f2682b6
Alkaid
4mo ago
[Fwd,Sm100] Compute kv_stage based on hdim instead of hard-coding
51b6575
Tri Dao
4mo ago
[Fwd,Sm100] Switch back to poly degree 3
72eb5de
Tri Dao
4mo ago
[Fwd,Sm100] Add polynomials degree 1 - 5
990b510
Tri Dao
4mo ago
[Fwd,Sm100] Use NamedBarrier to signal softmax -> corr warps
d78c84a
Tri Dao
4mo ago
[CuTe] Include broadcast dims in backward compile cache keys (#2298)
be76c60
bonpyt
4mo ago
Fix clang parser error of missing 'typename' prior to dependent type name occurs because `LLVM/Clang` is strictly adhering to C++ standards (#2295)
ceb1099
tomflinda
4mo ago
[Scheduler] Revert SingleTileScheduler to get block_idx
d146eff
Tri Dao
4mo ago
Top contributors
Builders behind this project.
tridao
890 commits
drisspg
55 commits
piercefreeman
25 commits
jayhshah
20 commits
guilhermeleobas
18 commits
ksivaman
18 commits
ipiszy
17 commits
henrylhtsang
15 commits