LoreonLabsPlatform
DocsHome
  • Overview

Intelligence

  • Markets
  • Builders
  • Research
  • Ecosystems
  • Launchpads
  • Search
Ecosystems

Python

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

PythonEmerging
GitHubWebsite
Stars
—
Forks
—
Contributors
8
Last push
9mo ago

Recent commits

Latest commits.

  • [Hybrid Allocator] Support full attention with different hidden size (#25101)
    9607d5eChen Zhang9mo ago
  • [Optimization] Avoid repeated model architecture conversion for pooling models (#25261)
    c60e613Cyrus Leung9mo ago
  • [Bugfix] fix tool call arguments is empty (#25223)
    f91480bChauncey9mo ago
  • [BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attention (#25298)
    6c5f82eChendi.Xue9mo ago
  • [BugFix] Exclude self when checking for port collision (#25286)
    b7f186bNick Hill9mo ago
[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) (#25268)
3642909JartX9mo ago
  • Improve weight loading for encoder models in Transformers backend (#25289)
    c308501Harry Mellor9mo ago
  • [Misc] Support more collective_rpc return types (#25294)
    535d800Nick Hill9mo ago
  • Top contributors

    Builders behind this project.

    WoosukKwon
    658 commits
    DarkLight1337
    550 commits
    youkaichao
    459 commits
    mgoin
    361 commits
    hmellor
    273 commits
    Isotr0py
    260 commits
    njhill
    208 commits
    jeejeelee
    206 commits