Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 92% Match Research Paper NLP Researchers,LLM Developers,AI Ethicists 2 weeks ago

Text Generation Beyond Discrete Token Sampling

large-language-models › reasoning
📄 Abstract

Abstract: In standard autoregressive generation, an LLM predicts the next-token distribution, samples a discrete token, and then discards the distribution, passing only the sampled token as new input. To preserve this distribution's rich information, we propose Mixture of Inputs (MoI), a training-free method for autoregressive generation. After generating a token following the standard paradigm, we construct a new input that blends the generated discrete token with the previously discarded token distribution. Specifically, we employ a Bayesian estimation method that treats the token distribution as the prior, the sampled token as the observation, and replaces the conventional one-hot vector with the continuous posterior expectation as the new model input. MoI allows the model to maintain a richer internal representation throughout the generation process, resulting in improved text quality and reasoning capabilities. On mathematical reasoning, code generation, and PhD-level QA tasks, MoI consistently improves performance across multiple models including QwQ-32B, Nemotron-Super-49B, Gemma-3-27B, and DAPO-Qwen-32B, with no additional training and negligible computational overhead.
Authors (5)
Yufan Zhuang
Liyuan Liu
Chandan Singh
Jingbo Shang
Jianfeng Gao
Submitted
May 20, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Proposes Mixture of Inputs (MoI), a training-free method for autoregressive text generation that preserves rich information from token distributions. MoI blends the generated discrete token with the discarded distribution using Bayesian estimation, replacing the one-hot vector with a continuous posterior expectation. This leads to improved text quality and reasoning capabilities.

Business Value

Enhances the quality and reasoning ability of LLM-generated text, leading to more useful and reliable AI assistants, content generators, and analytical tools.