Other

Aule-Attention

High-performance FlashAttention-2 for AMD, Intel, and Apple GPUs. Drop-in replacement for PyTorch SDPA. Triton backend for ROCm (MI300X, RDNA3), Vulkan backend for consumer GPUs. No CUDA required.

OtherEmerging

GitHub Website

Stars

—

Forks

—

Contributors

Last push

6mo ago

Recent commits

Latest commits.

fix: AMD kernel autotune key to avoid recompilation during generation
f2b6451xenn00106mo ago
docs: Update README with AMD MI300X benchmarks
e50b760xenn00106mo ago
feat: Add AMD MI300X optimized FlashAttention-2 kernel
27dbf10xenn00106mo ago
chore: Bump version to 0.3.6 with PagedAttention
5bdfd8exenn00106mo ago
docs: Add creator attribution to README
f8583e1xenn00106mo ago

Top contributors

Builders behind this project.

xenn0010

32 commits