LLM inference orchestrator for routing requests across heterogeneous backends (local GPU, cloud APIs) with explicit latency, cost, and failure-isolation trade-offs.
Latest commits.
Builders behind this project.