Specialized-Multi-Domain-LLM-Inference-Pipeline
Developed a specialised high-performance inference pipeline for Qwen3-1.7B tailored for algebra, geography, history, and Chinese culture, achieving a Time To First Token (TTFT) of under 150 ms. I optimised the architecture using 4-bit quantisation, Graph RAG to ensure high-concurrency throughput within a 10-minute window.