arxiv_cl 95% Match Research Paper AI Researchers,NLP Engineers,Developers of generative text models 1 week ago

Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way

generative-ai › diffusion

📄 Abstract

Abstract: Diffusion-based large language models (dLLMs) have exhibited substantial potential for parallel text generation, which may enable more efficient generation compared to autoregressive models. However, current dLLMs suffer from fixed generation lengths, which indicates the generation lengths of dLLMs have to be determined before decoding as a hyper-parameter, leading to issues in efficiency and flexibility. To solve these problems, in this work, we propose to train a diffusion LLM with native variable generation lengths, abbreviated as dLLM-Var. Concretely, we aim to train a model to accurately predict the [EOS] token in the generated text, which makes a dLLM be able to natively infer in a block diffusion manner, while still maintaining the ability of global bi-directional (full) attention and high parallelism. Experiments on standard benchmarks demonstrate that our method achieves a 30.1x speedup over traditional dLLM inference paradigms and a 2.4x speedup relative to autoregressive models such as Qwen and Llama. Our method achieves higher accuracy and faster inference, elevating dLLMs beyond mere academic novelty and supporting their practical use in real-world applications. Codes and models have been released.

Authors (7)

Yicun Yang

Cong Wang

Shaobo Wang

Zichen Wen

Biqing Qi

Hanlin Xu

+1 more

Submitted

October 28, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Proposes dLLM-Var, a diffusion-based LLM capable of native variable generation lengths by accurately predicting the [EOS] token. This enables block diffusion inference, achieving significant speedups over traditional dLLM inference and autoregressive models while maintaining parallelism and global attention.

Business Value

Enables faster and more flexible text generation, improving efficiency for applications like content creation, summarization, and dialogue systems.

Paper Metadata

Innovation Type

Model Architecture/Training Method

Deployment Feasibility

Moderate. Requires specialized training and inference infrastructure for diffusion models.

Limitations Addressed

Current diffusion LLMs suffer from fixed generation lengths, requiring pre-determined lengths and leading to inefficiencies compared to autoregressive models.

Performance Gains

30.1x speedup,2.4x speedup

Technical Tags

diffusion modelsLLMsparallel text generationvariable generation lengthEOS token predictionblock diffusionautoregressive modelstext generationefficiencyflexibility

Research Topics

Generative ModelsLarge Language ModelsDiffusion ModelsText GenerationEfficient AI

Methods & Architectures

Diffusion LLM (dLLM)native variable generation lengthsEOS token predictionblock diffusionglobal bi-directional attention Diffusion LLM (dLLM)Transformer

Applications & Tasks

Natural Language Generation Text Synthesis Creative AI Fixed Generation Lengths in Diffusion LLMsInefficiency in Autoregressive ModelsFlexibility in Text Generation Text GenerationParallel Text GenerationVariable Length Text Generation

Datasets & Benchmarks

Benchmarks

30.1x speedup over traditional dLLM inference • 2.4x speedup relative to autoregressive models

Related Fields

Generative AIDeep LearningNatural Language ProcessingDiffusion Models

Keywords

diffusion modelsLLMtext generationparallel generationvariable lengthEOS tokenblock diffusionautoregressiveefficiencygenerative AI

Academic Context

#Generative Models#Large Language Models#Diffusion Models#Text Generation#Efficient AI

Commercial Potential

Potential Products

Faster text generation APIsMore efficient content creation toolsAdvanced dialogue systems

Target Industries

TechnologyMedia and EntertainmentMarketingCustomer Service

Use Case Examples

Generating articles, stories, or marketing copy rapidlyCreating diverse chatbot responses with varying lengthsAccelerating summarization tasks

Competitive Edge

Offers a significant speed advantage for diffusion-based text generation by enabling variable lengths, overcoming a key limitation.

Market Opportunity

Large and growing market for generative AI text solutions.

Revenue Models

API accesslicensing of modelsspecialized generation services.

Resource Requirements

Compute Needs

High (for training diffusion models)

Data Requirements

Large text corpora for training.

Deployment Constraints

Diffusion models can be computationally intensive, though this work focuses on inference efficiency.

Scalability

The parallel nature of diffusion models and the proposed block inference contribute to scalability.

Production Readiness

Maturity Level

Research

Time to Market

1-3 years

Patent Potential

Moderate

View Full Paper Back to Papers