arxiv_ai 95% Match Research Paper NLP Researchers,ML Engineers,Generative AI Researchers 1 week ago

Non-Markovian Discrete Diffusion with Causal Language Models

generative-ai › diffusion

📄 Abstract

Abstract: Discrete diffusion models offer a flexible, controllable approach to structured sequence generation, yet they still lag behind causal language models in expressive power. A key limitation lies in their reliance on the Markovian assumption, which restricts each step to condition only on the current state, leading to potential uncorrectable error accumulation. In this paper, we introduce CaDDi (Causal Discrete Diffusion Model), a discrete diffusion model that conditions on the entire generative trajectory, thereby lifting the Markov constraint and allowing the model to revisit and improve past states. By unifying sequential (causal) and temporal (diffusion) reasoning in a single non-Markovian transformer, CaDDi also treats standard causal language models as a special case and permits the direct reuse of pretrained LLM weights with no architectural changes. Empirically, CaDDi outperforms state-of-the-art discrete diffusion baselines on natural-language benchmarks, substantially narrowing the remaining gap to large autoregressive transformers.

Authors (10)

Yangtian Zhang

Sizhuang He

Daniel Levine

Lawrence Zhao

David Zhang

Syed A Rizvi

+4 more

Submitted

February 13, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Introduces CaDDi, a non-Markovian discrete diffusion model that conditions on the entire generative trajectory, lifting the Markov constraint and unifying sequential and temporal reasoning within a transformer architecture, enabling direct reuse of pretrained LLM weights.

Business Value

Enables more powerful and controllable generation of structured text, useful for applications like creative writing, code generation, and complex dialogue systems.

Paper Metadata

Innovation Type

Algorithmic/Architectural

Deployment Feasibility

Moderate, requires significant computational resources for training and inference, but benefits from LLM pretraining.

Limitations Addressed

The Markovian assumption in discrete diffusion models, which limits their expressive power and can lead to error accumulation, and the difficulty in integrating diffusion models with powerful causal language models.

Technical Tags

Discrete Diffusion ModelsCausal Language ModelsNon-MarkovianSequence GenerationTransformersGenerative AILLM IntegrationPretrained ModelsNatural Language Generation

Research Topics

Generative ModelsSequence ModelingNatural Language ProcessingDeep Learning ArchitecturesDiffusion Models

Methods & Architectures

CaDDi (Causal Discrete Diffusion Model)Non-Markovian ConditioningTransformer ArchitecturePretrained LLM Weight Reuse TransformerCausal Language ModelDiscrete Diffusion Model

Applications & Tasks

Natural Language Processing Text Generation Machine Learning Research Structured Sequence GenerationOvercoming Markovian AssumptionImproving Expressive Power of Diffusion Models Generating structured sequencesImproving text generation quality

Related Fields

Deep LearningSequence ModelingGenerative ModelsNatural Language Processing

Keywords

Discrete DiffusionCausal Language ModelsNon-MarkovianSequence GenerationTransformersGenerative AILLMCaDDiText GenerationDeep Learning

Academic Context

#Generative Models#Sequence Modeling#Natural Language Processing#Deep Learning Architectures#Diffusion Models

Commercial Potential

Potential Products

Advanced text generation toolsCreative writing assistantsCode generation models

Target Industries

TechnologyMediaPublishingSoftware Development

Use Case Examples

Generating coherent and contextually relevant long-form textCreating structured data outputs from natural language promptsImproving the quality and control of generated narratives

Competitive Edge

Combines the strengths of discrete diffusion models (controllability) and causal language models (expressive power) into a unified framework, outperforming existing discrete diffusion baselines.

Market Opportunity

Large and growing market for advanced text generation and NLP models.

Resource Requirements

Compute Needs

High (for training and inference)

Data Requirements

Large text corpora for training.

Deployment Constraints

Computational cost and potential for generating nonsensical or biased text.

Scalability

Leverages transformer architectures, which are generally scalable, but training can be resource-intensive.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers