LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Narratives
Ecosystems
Launchpads

Discover

Search
Sources

Cuda

GroupedGEMM

PyTorch bindings for CUTLASS and CUBLAS Grouped GEMM.

CudaEmerging

Stars

—

Forks

—

Contributors

5

Last push

6mo ago

Recent commits

Latest commits.

[Build] rename url
3ae3288chenchiyu9mo ago
!4 [Feat] add use_cutlass in gmm param to switch both cublas and cutlass gmm
ca90330chenchiyu12mo ago
!2 [Feat] use cutlass grouped_gemm to avoid cpu and cuda sync
d23c899chenchiyu13mo ago
[Fix] illegal memory access due to int32 overflow
ecc4039chenchiyu15mo ago
feat: make permute suitable for deepep dispatch
6dee718chenchiyu16mo ago

hotfix/fix_nvtx_pop (#7)

5c1d831littsk18mo ago

fix moe_permute_topK for token-drop case.

172fadaShiqing Fan24mo ago

Fix streams sync behavior of grouped gemm op

6eef3e3Jiang Shao25mo ago

Top contributors

Builders behind this project.