Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: While model serving has unlocked unprecedented capabilities, the high cost of
serving large-scale models continues to be a significant barrier to widespread
accessibility and rapid innovation. Compiler optimizations have long driven
substantial performance improvements, but existing compilers struggle with
neural workloads due to the exponentially large and highly interdependent space
of possible transformations. Although existing stochastic search techniques can
be effective, they are often sample-inefficient and fail to leverage the
structural context underlying compilation decisions. We set out to investigate
the research question of whether reasoning with large language models (LLMs),
without any retraining, can leverage the context-aware decision space of
compiler optimizations to significantly improve sample efficiency. To that end,
we introduce a novel compilation framework (dubbed Reasoning Compiler) that
formulates optimization as a sequential, context-aware decision process guided
by a large language model and structured Monte Carlo tree search (MCTS). The
LLM acts as a proposal mechanism, suggesting hardware-informed transformations
that reflect the current program state and accumulated performance feedback.
MCTS incorporates the LLM-generated proposals to balance exploration and
exploitation, facilitating structured, context-sensitive traversal of the
expansive compiler optimization space. By achieving substantial speedups with
markedly fewer samples than leading neural compilers, our approach demonstrates
the potential of LLM-guided reasoning to transform the landscape of compiler
optimization.
Authors (5)
Sujun Tang
Christopher Priebe
Rohan Mahapatra
Lianhui Qin
Hadi Esmaeilzadeh
Key Contributions
Introduces a novel compilation framework ('Reasoning Compiler') that uses LLMs for context-aware optimization decisions in model serving. This approach aims to significantly improve sample efficiency compared to traditional stochastic search methods, reducing serving costs.
Business Value
Dramatically reduces the operational costs associated with deploying and serving large AI models, making advanced AI capabilities more accessible and enabling faster innovation cycles across various industries.