High-performance CUDA kernels for LLM inference & training — callable from Python, Rust, Go, and TypeScript through one stable C ABI.
Latest commits.
Builders behind this project.