LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Ecosystems
Launchpads

Search

Shell

nvidia-spark

my personal playground for nvidia spark

ShellEmerging

Stars

—

Forks

1

Contributors

1

Last push

11h ago

Recent commits

Latest commits.

achieved 90 t/s for llama.cpp MTP in Qwen3.6, can't achieve 130 t/s for Atlas, look details in https://github.com/Avarok-Cybersecurity/atlas/issues/173
619817aSlach11h ago
solved https://github.com/z-lab/paroquant/issues/30 and https://github.com/z-lab/paroquant/issues/30
233d775Slach2mo ago
tested Paroquant + gemma4, 2500 - 4000 t/s prefill, 7-9t/s generation, applied workaround for solved https://github.com/z-lab/paroquant/issues/30 and https://github.com/z-lab/paroquant/issues/30
c5a25c9Slach2mo ago
switch to mradermacher/Nemotron-Cascade-2-30B, 60-70t/s generation - 2900t/s context parsing
2260965Slach3mo ago
switch coding to unsloth/Qwen3.5-35B

18a2b49Slach3mo ago

add b4, try to Qwen3.5 NVFP4 (failed https://github.com/sgl-project/sglang/issues/20973), FP8 7t/s, simplify vllm build with adding fastsafetensors

2991854Slach3mo ago

llama.cpp works, 1500 t/s prefill, 30 t/s generation

4be1ce3Slach3mo ago

try to use vllm instead of llama.cpp fot better KV cache and performance

c5f0c3dSlach3mo ago

Top contributors

Builders behind this project.