Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: The "end-to-end" label for LLMs is a misnomer. In practice, they depend on a
non-differentiable decoding process that requires laborious, hand-tuning of
hyperparameters like temperature and top-p. This paper introduces AutoDeco, a
novel architecture that enables truly "end-to-end" generation by learning to
control its own decoding strategy. We augment the standard transformer with
lightweight heads that, at each step, dynamically predict context-specific
temperature and top-p values alongside the next-token logits. This approach
transforms decoding into a parametric, token-level process, allowing the model
to self-regulate its sampling strategy within a single forward pass.
Through extensive experiments on eight benchmarks, we demonstrate that
AutoDeco not only significantly outperforms default decoding strategies but
also achieves performance comparable to an oracle-tuned baseline derived from
"hacking the test set"-a practical upper bound for any static method.
Crucially, we uncover an emergent capability for instruction-based decoding
control: the model learns to interpret natural language commands (e.g.,
"generate with low randomness") and adjusts its predicted temperature and top-p
on a token-by-token basis, opening a new paradigm for steerable and interactive
LLM decoding.
Authors (9)
Zhichao Wang
Dongyang Ma
Xinting Huang
Deng Cai
Tian Lan
Jiahao Xu
+3 more
Submitted
October 30, 2025
Key Contributions
Introduces AutoDeco, a novel architecture that enables truly end-to-end language generation by learning to dynamically predict and control its own decoding strategy (temperature, top-p) at a token level. This transforms decoding into a parametric process within a single forward pass.
Business Value
Improves the quality and consistency of generated text, reducing the need for manual tuning and enabling more reliable deployment of LLMs for various content generation tasks.