LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Ecosystems
Launchpads

Search

Python

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

PythonEmerging

Stars

—

Forks

—

Contributors

8

Last push

9mo ago

Recent commits

Latest commits.

[Hybrid Allocator] Support full attention with different hidden size (#25101)
9607d5eChen Zhang9mo ago
[Optimization] Avoid repeated model architecture conversion for pooling models (#25261)
c60e613Cyrus Leung9mo ago
[Bugfix] fix tool call arguments is empty (#25223)
f91480bChauncey9mo ago
[BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attention (#25298)
6c5f82eChendi.Xue9mo ago
[BugFix] Exclude self when checking for port collision (#25286)
b7f186bNick Hill9mo ago

[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) (#25268)

3642909JartX9mo ago

Improve weight loading for encoder models in Transformers backend (#25289)

c308501Harry Mellor9mo ago

[Misc] Support more collective_rpc return types (#25294)

535d800Nick Hill9mo ago

Top contributors

Builders behind this project.