LoreonLabsPlatform
DocsHome
  • Overview

Intelligence

  • Markets
  • Builders
  • Research
  • Ecosystems
  • Launchpads
  • Search
Ecosystems

Python

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

PythonEmerging
GitHubWebsite
Stars
—
Forks
—
Contributors
8
Last push
8mo ago

Recent commits

Latest commits.

  • build vllm
    db3d113HunterChen8mo ago
  • [torch.compile] Unwrap fused_marlin_moe custom op (#26739)
    8ae1692Varun Sundar Rabindranath8mo ago
  • [build][torch.compile] upgrade depyf version (#26702)
    8a0af6ayoukaichao8mo ago
  • [Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE (#26742)
    cfded80Jialin Ouyang8mo ago
  • [compile] Enable sequence parallelism for full cuda graph without specifying compile sizes (#26681)
    b59dd19Angela Yi8mo ago
[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend (#26732)
3e051bdMichael Goin8mo ago
  • [Misc][DP] support customized aggregated logger for dp (#24354)
    8317f72Lucia Fang8mo ago
  • Add tests for chunked prefill and prefix cache with causal pooling models (#26526)
    d8bebb0Maximilien de Bayser8mo ago
  • Top contributors

    Builders behind this project.

    WoosukKwon
    672 commits
    DarkLight1337
    611 commits
    youkaichao
    465 commits
    mgoin
    390 commits
    hmellor
    306 commits
    Isotr0py
    281 commits
    njhill
    218 commits
    jeejeelee
    216 commits