arxiv_ai 95% Match Research Paper NLP Researchers,ML Engineers,Developers of Generative AI Models,AI Researchers 1 week ago

The End of Manual Decoding: Towards Truly End-to-End Language Models

large-language-models › model-architecture

📄 Abstract

Abstract: The "end-to-end" label for LLMs is a misnomer. In practice, they depend on a non-differentiable decoding process that requires laborious, hand-tuning of hyperparameters like temperature and top-p. This paper introduces AutoDeco, a novel architecture that enables truly "end-to-end" generation by learning to control its own decoding strategy. We augment the standard transformer with lightweight heads that, at each step, dynamically predict context-specific temperature and top-p values alongside the next-token logits. This approach transforms decoding into a parametric, token-level process, allowing the model to self-regulate its sampling strategy within a single forward pass. Through extensive experiments on eight benchmarks, we demonstrate that AutoDeco not only significantly outperforms default decoding strategies but also achieves performance comparable to an oracle-tuned baseline derived from "hacking the test set"-a practical upper bound for any static method. Crucially, we uncover an emergent capability for instruction-based decoding control: the model learns to interpret natural language commands (e.g., "generate with low randomness") and adjusts its predicted temperature and top-p on a token-by-token basis, opening a new paradigm for steerable and interactive LLM decoding.

Authors (9)

Zhichao Wang

Dongyang Ma

Xinting Huang

Deng Cai

Tian Lan

Jiahao Xu

+3 more

Submitted

October 30, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Introduces AutoDeco, a novel architecture that enables truly end-to-end language generation by learning to dynamically predict and control its own decoding strategy (temperature, top-p) at a token level. This transforms decoding into a parametric process within a single forward pass.

Business Value

Improves the quality and consistency of generated text, reducing the need for manual tuning and enabling more reliable deployment of LLMs for various content generation tasks.

Paper Metadata

Innovation Type

Architectural

Deployment Feasibility

High, as it integrates into the transformer architecture and operates within a single forward pass, simplifying deployment.

Limitations Addressed

The reliance on non-differentiable decoding processes and laborious, hand-tuning of hyperparameters (temperature, top-p) in existing LLMs.

Performance Gains

Significantly outperforms default decoding strategies and achieves performance comparable to oracle-tuned baselines across eight benchmarks.

Technical Tags

End-to-End Language ModelsDecoding StrategyHyperparameter TuningTransformer ArchitectureDynamic PredictionToken-Level ControlGenerative ModelsLLMs

Research Topics

Natural Language GenerationLarge Language ModelsModel ArchitecturesDecoding AlgorithmsGenerative AI

Methods & Architectures

AutoDeco ArchitectureDynamic Prediction of Temperature and Top-pToken-Level Parametric Control Augmented Transformer

Applications & Tasks

Natural Language Generation Text Generation Creative Writing Code Generation Manual Decoding Hyperparameter TuningSuboptimal Generation QualityLack of True End-to-End Training Enabling truly end-to-end language modelsLearning optimal decoding strategiesDynamically controlling generation parameters

Datasets & Benchmarks

Benchmarks

Eight benchmarks

Generation QualityPerformance compared to oracle-tuned baselinePerformance compared to default decoding strategies

Related Fields

Natural Language ProcessingMachine LearningDeep LearningArtificial IntelligenceComputer Science

Keywords

Large Language ModelsEnd-to-EndDecodingTransformerGenerative AIText GenerationHyperparametersAutoregressive ModelsLLMNLP

Academic Context

#Natural Language Generation#Large Language Models#Model Architectures#Decoding Algorithms#Generative AI

Commercial Potential

Potential Products

Next-generation LLM architecturesImproved text generation APIsContent creation tools

Target Industries

TechnologyMediaPublishingSoftware DevelopmentMarketing

Use Case Examples

Generating high-quality articles and storiesAutomating code generationCreating marketing copyDeveloping more coherent chatbots

Competitive Edge

Achieves 'true' end-to-end training by learning decoding, surpassing methods that rely on fixed or post-hoc tuning.

Market Opportunity

Massive and rapidly growing market for LLM applications.

Revenue Models

Licensing of the AutoDeco architecturedevelopment of new generative AI products.

Resource Requirements

Compute Needs

Moderate to high, similar to training standard transformer models.

Data Requirements

Requires large text corpora for training.

Deployment Constraints

Integration into existing LLM deployment pipelines.

Scalability

Scalability is expected to be similar to standard transformer models.

Production Readiness

Maturity Level

Research

Time to Market

1-3 years

Patent Potential

High, for the AutoDeco architecture and its dynamic decoding mechanism.

View Full Paper Back to Papers