LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Ecosystems
Launchpads

Search

Python

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

PythonEmerging

Stars

—

Forks

—

Contributors

8

Last push

11mo ago

Recent commits

Latest commits.

[BugFix] Fix DP Coordinator incorrect debug log message (#19624)
bd517ebNick Hill12mo ago
Adding "AMD: Multi-step Tests" to amdproduction. (#19508)
d65668bConcurrensee12mo ago
[torch.compile] Use custom ops when use_inductor=False (#19618)
aafbbd9Woosuk Kwon12mo ago
[Doc] Add troubleshooting section to k8s deployment (#19377)
0f08745Anna Pendleton12mo ago
[CUDA] Enable full cudagraph for FlashMLA (#18581)
3597b06Luka Govedič12mo ago

[doc][mkdocs] fix the duplicate Supported features sections in GPU docs (#19606)

1015296Reid12mo ago

[Refactor] Remove unused variables in `moe_permute_unpermute_kernel.inl` (#19573)

ce9dc02Wentao Ye12mo ago

[Model] Fix minimax model cache & lm_head precision (#19592)

a24cb91qscqesze12mo ago

Top contributors

Builders behind this project.