LoreonLabsPlatform
DocsHome
  • Overview

Intelligence

  • Markets
  • Builders
  • Research
  • Ecosystems
  • Launchpads
  • Search
Ecosystems

Python

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

PythonEmerging
GitHubWebsite
Stars
—
Forks
—
Contributors
8
Last push
8d ago

Recent commits

Latest commits.

  • Fix a pre-commit error that snuck into main via #13693
    5549bccRussell Bryant16mo ago
  • [V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) (#10980)
    befc402afeldman-nm16mo ago
  • [Misc][Docs] Raise error when flashinfer is not installed and `VLLM_ATTENTION_BACKEND` is set (#12513)
    444b0f0Nicolò Lucchesi16mo ago
  • [BugFix] Illegal memory access for MoE On H20 (#13693)
    ccc0051Zhonghua Deng16mo ago
  • Expert Parallelism (EP) Support for DeepSeek V2 (#12583)
    781096eJongseok Park16mo ago
[CI/Build] add python-json-logger to requirements-common (#12842)
7940d8aRoger Meier16mo ago
  • [Bugfix] fix(logging): add missing opening square bracket (#13011)
    c0e3ecdRoger Meier16mo ago
  • [model][refactor] remove cuda hard code in models and layers (#13658)
    23eca9cMengqing Cao16mo ago
  • Top contributors

    Builders behind this project.

    WoosukKwon
    515 commits
    youkaichao
    419 commits
    DarkLight1337
    283 commits
    mgoin
    146 commits
    simon-mo
    137 commits
    Isotr0py
    124 commits
    ywang96
    116 commits
    zhuohan123
    111 commits