LoreonLabsPlatform
DocsHome
  • Overview

Intelligence

  • Markets
  • Builders
  • Research
  • Ecosystems
  • Launchpads
  • Search
Ecosystems

Python

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

PythonEmerging
GitHubWebsite
Stars
—
Forks
—
Contributors
8
Last push
11mo ago

Recent commits

Latest commits.

  • [BugFix] Fix DP Coordinator incorrect debug log message (#19624)
    bd517ebNick Hill12mo ago
  • Adding "AMD: Multi-step Tests" to amdproduction. (#19508)
    d65668bConcurrensee12mo ago
  • [torch.compile] Use custom ops when use_inductor=False (#19618)
    aafbbd9Woosuk Kwon12mo ago
  • [Doc] Add troubleshooting section to k8s deployment (#19377)
    0f08745Anna Pendleton12mo ago
  • [CUDA] Enable full cudagraph for FlashMLA (#18581)
    3597b06Luka Govedič12mo ago
[doc][mkdocs] fix the duplicate Supported features sections in GPU docs (#19606)
1015296Reid12mo ago
  • [Refactor] Remove unused variables in `moe_permute_unpermute_kernel.inl` (#19573)
    ce9dc02Wentao Ye12mo ago
  • [Model] Fix minimax model cache & lm_head precision (#19592)
    a24cb91qscqesze12mo ago
  • Top contributors

    Builders behind this project.

    WoosukKwon
    565 commits
    DarkLight1337
    444 commits
    youkaichao
    443 commits
    mgoin
    251 commits
    Isotr0py
    187 commits
    hmellor
    172 commits
    simon-mo
    156 commits
    njhill
    150 commits