LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Ecosystems
Launchpads

Search

Python

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

PythonEmerging

Stars

—

Forks

—

Contributors

8

Last push

8d ago

Recent commits

Latest commits.

Fix a pre-commit error that snuck into main via #13693
5549bccRussell Bryant16mo ago
[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) (#10980)
befc402afeldman-nm16mo ago
[Misc][Docs] Raise error when flashinfer is not installed and `VLLM_ATTENTION_BACKEND` is set (#12513)
444b0f0Nicolò Lucchesi16mo ago
[BugFix] Illegal memory access for MoE On H20 (#13693)
ccc0051Zhonghua Deng16mo ago
Expert Parallelism (EP) Support for DeepSeek V2 (#12583)
781096eJongseok Park16mo ago

[CI/Build] add python-json-logger to requirements-common (#12842)

7940d8aRoger Meier16mo ago

[Bugfix] fix(logging): add missing opening square bracket (#13011)

c0e3ecdRoger Meier16mo ago

[model][refactor] remove cuda hard code in models and layers (#13658)

23eca9cMengqing Cao16mo ago

Top contributors

Builders behind this project.