try KIVI on ChatGLM3, Only to find multi-query attention is more efficient
Latest commits.
Builders behind this project.