Loreon
Labs
Platform
Search…
⌘K
Docs
Home
Ecosystems
Shell
nvidia-spark
my personal playground for nvidia spark
Shell
Emerging
GitHub
Stars
—
Forks
1
Contributors
1
Last push
11h ago
Recent commits
Latest commits.
achieved 90 t/s for llama.cpp MTP in Qwen3.6, can't achieve 130 t/s for Atlas, look details in https://github.com/Avarok-Cybersecurity/atlas/issues/173
619817a
Slach
11h ago
solved https://github.com/z-lab/paroquant/issues/30 and https://github.com/z-lab/paroquant/issues/30
233d775
Slach
2mo ago
tested Paroquant + gemma4, 2500 - 4000 t/s prefill, 7-9t/s generation, applied workaround for solved https://github.com/z-lab/paroquant/issues/30 and https://github.com/z-lab/paroquant/issues/30
c5a25c9
Slach
2mo ago
switch to mradermacher/Nemotron-Cascade-2-30B, 60-70t/s generation - 2900t/s context parsing
2260965
Slach
3mo ago
switch coding to unsloth/Qwen3.5-35B
18a2b49
Slach
3mo ago
add b4, try to Qwen3.5 NVFP4 (failed https://github.com/sgl-project/sglang/issues/20973), FP8 7t/s, simplify vllm build with adding fastsafetensors
2991854
Slach
3mo ago
llama.cpp works, 1500 t/s prefill, 30 t/s generation
4be1ce3
Slach
3mo ago
try to use vllm instead of llama.cpp fot better KV cache and performance
c5f0c3d
Slach
3mo ago
Top contributors
Builders behind this project.
Slach
60 commits