Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper MLOps engineers,System architects,Researchers in distributed systems and AI infrastructure 5 days ago

HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location

large-language-models › model-architecture
📄 Abstract

Abstract: Large language models (LLMs) have facilitated a wide range of applications with distinct service-level objectives (SLOs), from latency-sensitive online tasks like interactive chatbots to throughput-oriented offline workloads like data synthesis. The existing deployment model, which dedicates machines to each workload, simplifies SLO management but often leads to poor resource utilization. This paper introduces HyGen, an interference-aware LLM serving system that enables efficient co-location of online and offline workloads while preserving SLOs. HyGen incorporates two key innovations: (1) performance control mechanisms, including a latency predictor to estimate batch execution time and an SLO-aware profiler to quantify latency interference, and (2) SLO-aware offline scheduling policies that maximize serving throughput and prevent starvation. Our evaluation on production workloads shows that HyGen achieves up to 3.9-5.8x throughput gains over online and hybrid serving baselines, while ensuring latency SLOs. The code of HyGen is publicly available at https://github.com/UIUC-MLSys/HyGen.
Authors (3)
Ting Sun
Penghan Wang
Fan Lai
Submitted
January 15, 2025
arXiv Category
cs.DC
arXiv PDF

Key Contributions

Introduces HyGen, an interference-aware LLM serving system that efficiently co-locates online (latency-sensitive) and offline (throughput-oriented) workloads. It uses a latency predictor and SLO-aware profiler to manage interference and employs SLO-aware scheduling to maximize throughput.

Business Value

Significantly reduces operational costs for deploying LLMs by improving resource utilization and enabling mixed workload serving, making LLM applications more economically viable.