Loreon
Labs
Platform
Docs
Home
Ecosystems
Other
flash-attention
Fast and memory-efficient exact attention
Other
Emerging
GitHub
Stars
—
Forks
—
Contributors
8
Last push
28mo ago
Recent commits
Latest commits.
Bump to v2.5.4
43950dd
Tri Dao
28mo ago
Update Cutlass to v3.4.1
4d6b794
Tri Dao
28mo ago
Don't need to reduce row_sum during online softmax
b32efb1
Tri Dao
28mo ago
Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832)
f45bbb4
Qubitium
28mo ago
Bump to v2.5.3
5cdabc2
Tri Dao
29mo ago
Fix dv = torch::empty_like(k) for mha_bwd_varlen as well
d9a5cb2
Tri Dao
29mo ago
Add window_size option to ParallelMHA
a190df0
Tri Dao
29mo ago
fix backward for when query and key have different contiguity (#818)
2423cca
Brian Hirsh
29mo ago
Top contributors
Builders behind this project.
tridao
454 commits
piercefreeman
25 commits
ksivaman
13 commits
DanFu09
11 commits
tmm1
4 commits
lxuechen
4 commits
robotcator
4 commits
danthe3rd
3 commits