Loreon
Labs
Platform
Docs
Home
Ecosystems
Cuda
GroupedGEMM
PyTorch bindings for CUTLASS and CUBLAS Grouped GEMM.
Cuda
Emerging
GitHub
Stars
—
Forks
—
Contributors
5
Last push
6mo ago
Recent commits
Latest commits.
[Build] rename url
3ae3288
chenchiyu
9mo ago
!4 [Feat] add use_cutlass in gmm param to switch both cublas and cutlass gmm
ca90330
chenchiyu
12mo ago
!2 [Feat] use cutlass grouped_gemm to avoid cpu and cuda sync
d23c899
chenchiyu
13mo ago
[Fix] illegal memory access due to int32 overflow
ecc4039
chenchiyu
15mo ago
feat: make permute suitable for deepep dispatch
6dee718
chenchiyu
16mo ago
hotfix/fix_nvtx_pop (#7)
5c1d831
littsk
18mo ago
fix moe_permute_topK for token-drop case.
172fada
Shiqing Fan
24mo ago
Fix streams sync behavior of grouped gemm op
6eef3e3
Jiang Shao
25mo ago
Top contributors
Builders behind this project.
fanshiqing
16 commits
tgale96
12 commits
CyCle1024
5 commits
StudyingShao
5 commits
littsk
1 commits