LoreonLabsPlatform
DocsHome
  • Overview

Intelligence

  • Markets
  • Builders
  • Research
  • Ecosystems
  • Launchpads
  • Search
Ecosystems

Python

QLLM

A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.

PythonEmerging
GitHub
Stars
190
Forks
19
Contributors
5
Last push
3mo ago

Recent commits

Latest commits.

  • Adjust chat loop routing for fastchat/transformers (#178)
    8189c86wejoncy3mo ago
  • Bump version to 0.2.3.1 (#177)
    2fdebf6wejoncy3mo ago
  • fix: parallel_download_decorator compat with transformers >= 5, use dtype instead of torch_dtype (#176)
    5f8e1a4wejoncy3mo ago
  • docs: update README with CUDA 13.0, Python 3.11-3.13, new GPU archs (#175)
    9fafeb4wejoncy3mo ago
  • ci: add PyPI deploy stage with manual approval (#174)
    014c716wejoncy3mo ago
fix: CI build - ubuntu-22.04, MSVC setup for Windows (#173)
ef9fbffwejoncy3mo ago
  • Bump version to 0.2.3 (#172)
    58b98dfwejoncy3mo ago
  • fix: compatibility with transformers >= 5 and support non-llama models in chat plugin (#171)
    661d976wejoncy4mo ago
  • Top contributors

    Builders behind this project.

    wejoncy
    140 commits
    emphasis10
    1 commits
    ReinForce-II
    1 commits
    yufenglee
    1 commits
    aciddelgado
    1 commits