Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Large Language Models (LLMs) demonstrate their reasoning ability through
chain-of-thought (CoT) generation. However, LLM's autoregressive decoding may
limit the ability to revisit and refine earlier tokens in a holistic manner,
which can also lead to inefficient exploration for diverse solutions. In this
paper, we propose LaDiR (Latent Diffusion Reasoner), a novel reasoning
framework that unifies the expressiveness of continuous latent representation
with the iterative refinement capabilities of latent diffusion models for an
existing LLM. We first construct a structured latent reasoning space using a
Variational Autoencoder (VAE) that encodes text reasoning steps into blocks of
thought tokens, preserving semantic information and interpretability while
offering compact but expressive representations. Subsequently, we utilize a
latent diffusion model that learns to denoise a block of latent thought tokens
with a blockwise bidirectional attention mask, enabling longer horizon and
iterative refinement with adaptive test-time compute. This design allows
efficient parallel generation of diverse reasoning trajectories, allowing the
model to plan and revise the reasoning process holistically. We conduct
evaluations on a suite of mathematical reasoning and planning benchmarks.
Empirical results show that LaDiR consistently improves accuracy, diversity,
and interpretability over existing autoregressive, diffusion-based, and latent
reasoning methods, revealing a new paradigm for text reasoning with latent
diffusion.
Key Contributions
LaDiR unifies latent diffusion models with LLMs to enhance text reasoning. It uses a VAE to create a structured latent reasoning space and a latent diffusion model for iterative refinement of thought tokens, overcoming limitations of autoregressive decoding and enabling more holistic reasoning.
Business Value
Could lead to more reliable and sophisticated AI assistants capable of complex problem-solving and nuanced reasoning, improving user experience in applications requiring deep understanding.