LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Narratives
Ecosystems
Launchpads

Discover

Search
Sources

Other

flash-attention

Fast and memory-efficient exact attention

OtherEmerging

Stars

—

Forks

—

Contributors

8

Last push

28mo ago

Recent commits

Latest commits.

Bump to v2.5.4
43950ddTri Dao28mo ago
Update Cutlass to v3.4.1
4d6b794Tri Dao28mo ago
Don't need to reduce row_sum during online softmax
b32efb1Tri Dao28mo ago
Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832)
f45bbb4Qubitium28mo ago
Bump to v2.5.3

5cdabc2

Tri Dao

29mo ago

Fix dv = torch::empty_like(k) for mha_bwd_varlen as well

d9a5cb2Tri Dao29mo ago

Add window_size option to ParallelMHA

a190df0Tri Dao29mo ago

fix backward for when query and key have different contiguity (#818)

2423ccaBrian Hirsh29mo ago

Top contributors

Builders behind this project.