LoreonLabsPlatform
DocsHome
  • Overview

Intelligence

  • Markets
  • Builders
  • Research
  • Ecosystems
  • Launchpads
  • Search
Ecosystems

Python

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

PythonEmerging
GitHubWebsite
Stars
—
Forks
—
Contributors
8
Last push
9h ago

Recent commits

Latest commits.

  • [Rust Frontend] Add CORS support (#45753)
    cca3365Tahsin Tunan15h ago
  • [CI] Fix attention benchmark smoke test (#45728)
    040df8fMatthew Bonanni15h ago
  • [Perf] Add VLLM_TRITON_FORCE_FIRST_CONFIG to skip Triton autotuning (#42425)
    ced32bbFrancesco Fusco16h ago
  • [Bugfix][MoE] Restore routed output unpadding before shared expert add (#45707)
    c5e5c33Netanel Haber16h ago
  • [Quant] Support modelopt_mixed on Ampere (SM80/SM86) (#45306)
    a8c86eeMike G16h ago
[ROCm][CI] Gate incompatible HF references on Transformers v5 (#41532)
7e179e4Andreas Karatzas16h ago
  • [ZenCPU] Add zencpu Platform Runtime Logging and Docs (#42726)
    405c7cfLalithnarayan C17h ago
  • [Refactor] Remove `Fp8OnlineLinearMethod` as scheduled (#45463)
    3f53e21Wentao Ye17h ago
  • Top contributors

    Builders behind this project.

    DarkLight1337
    897 commits
    WoosukKwon
    800 commits
    mgoin
    537 commits
    hmellor
    501 commits
    youkaichao
    471 commits
    Isotr0py
    415 commits
    njhill
    382 commits
    yewentao256
    333 commits