An optimized MLX (Apple Silicon Metal) Server for running local LLMs with higher performance inference using KV cache, batching, and parallel processing. Easy to configure UI and support for both OpenAI and Anthropic protocols
Latest commits.
Builders behind this project.