LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Ecosystems
Launchpads

Search

Python

QLLM

A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.

PythonEmerging

Stars

190

Forks

19

Contributors

5

Last push

3mo ago

Recent commits

Latest commits.

Adjust chat loop routing for fastchat/transformers (#178)
8189c86wejoncy3mo ago
Bump version to 0.2.3.1 (#177)
2fdebf6wejoncy3mo ago
fix: parallel_download_decorator compat with transformers >= 5, use dtype instead of torch_dtype (#176)
5f8e1a4wejoncy3mo ago
docs: update README with CUDA 13.0, Python 3.11-3.13, new GPU archs (#175)
9fafeb4wejoncy3mo ago
ci: add PyPI deploy stage with manual approval (#174)
014c716wejoncy3mo ago

fix: CI build - ubuntu-22.04, MSVC setup for Windows (#173)

ef9fbffwejoncy3mo ago

Bump version to 0.2.3 (#172)

58b98dfwejoncy3mo ago

fix: compatibility with transformers >= 5 and support non-llama models in chat plugin (#171)

661d976wejoncy4mo ago

Top contributors

Builders behind this project.