LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Narratives
Ecosystems
Launchpads

Discover

Search
Sources

Python

flash-attention

Fast and memory-efficient exact attention

PythonEmerging

Stars

—

Forks

—

Contributors

8

Last push

15mo ago

Recent commits

Latest commits.

Dynamic autotune configs for devices with warp size != 32 (#1534)
27f501dschung-amd15mo ago
Fix FP8 test to quantize KV cache for reference impl as well
4b5eeabTri Dao15mo ago
Update MLA decode benchmark to use get_scheduler_metadata
6c87facTri Dao15mo ago
Add option to precompute scheduler metadata
fa60e7cTri Dao15mo ago
Loop on num_splits instead of parameterizing it in kvcache test
90f27a2Tri Dao15mo ago

Fix: num_splits_dynamic_ptr needs to be set before get_num_splits

897c845Tri Dao15mo ago

Simplify prepare_varlen_num_blocks_kernel, restrict to batch <= 992

46e1d4aTri Dao15mo ago

000090dTri Dao15mo ago

Top contributors

Builders behind this project.