LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Ecosystems
Launchpads

Search

Python

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

PythonEmerging

Stars

—

Forks

—

Contributors

8

Last push

8mo ago

Recent commits

Latest commits.

build vllm
db3d113HunterChen8mo ago
[torch.compile] Unwrap fused_marlin_moe custom op (#26739)
8ae1692Varun Sundar Rabindranath8mo ago
[build][torch.compile] upgrade depyf version (#26702)
8a0af6ayoukaichao8mo ago
[Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE (#26742)
cfded80Jialin Ouyang8mo ago
[compile] Enable sequence parallelism for full cuda graph without specifying compile sizes (#26681)
b59dd19Angela Yi8mo ago

[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend (#26732)

3e051bdMichael Goin8mo ago

[Misc][DP] support customized aggregated logger for dp (#24354)

8317f72Lucia Fang8mo ago

Add tests for chunked prefill and prefix cache with causal pooling models (#26526)

d8bebb0Maximilien de Bayser8mo ago

Top contributors

Builders behind this project.