LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Ecosystems
Launchpads

Search

Cuda

SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

CudaEmerging

Stars

—

Forks

—

Contributors

8

Last push

10mo ago

Recent commits

Latest commits.

Fix setup.py for machines without GPUs
9053b45Dmitry Nedospasov10mo ago
Merge pull request #231 from BobQC/main
798c791whx100311mo ago
fix bug: no skip when mask_dtype is float
34ea987BobQC11mo ago
Merge pull request #227 from BobQC/main
3c8f3b1whx100311mo ago
fix qk_mask when mask_dtype=float
f78d412BobQC11mo ago

add attn_mask to sageattn_qk_int8_pv_fp16_triton

499828bBobQC11mo ago

Merge pull request #224 from thu-ml/fix/attention-mask

f75e969whx100311mo ago

fix mask computation for better precision

3917283whx100311mo ago

Top contributors

Builders behind this project.