Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 88% Match Research Paper AI Researchers,Computer Vision Engineers,NLP Scientists,Machine Learning Engineers 1 week ago

Latent Chain-of-Thought for Visual Reasoning

large-language-models › multimodal-llms
📄 Abstract

Abstract: Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs). However, existing training algorithms such as SFT, PPO, and GRPO may not generalize well across unseen reasoning tasks and heavily rely on a biased reward model. To address this challenge, we reformulate reasoning in LVLMs as posterior inference and propose a scalable training algorithm based on amortized variational inference. By leveraging diversity-seeking reinforcement learning algorithms, we introduce a novel sparse reward function for token-level learning signals that encourage diverse, high-likelihood latent CoT, overcoming deterministic sampling limitations and avoiding reward hacking. Additionally, we implement a Bayesian inference-scaling strategy that replaces costly Best-of-N and Beam Search with a marginal likelihood to efficiently rank optimal rationales and answers. We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on seven reasoning benchmarks, in terms of effectiveness, generalization, and interpretability.
Authors (8)
Guohao Sun
Hang Hua
Jian Wang
Jiebo Luo
Sohail Dianat
Majid Rabbani
+2 more
Submitted
October 27, 2025
arXiv Category
cs.AI
arXiv PDF

Key Contributions

This paper proposes 'Latent Chain-of-Thought' (Latent CoT) for Large Vision-Language Models (LVLMs), reformulating reasoning as posterior inference and introducing a scalable training algorithm based on amortized variational inference. It uses diversity-seeking RL with a sparse reward function to encourage diverse latent CoTs and a Bayesian inference-scaling strategy for efficient ranking, improving generalization and avoiding reward hacking.

Business Value

Enhances the trustworthiness and capability of AI systems that interpret visual information and reason about it, leading to more reliable applications in areas like autonomous driving, medical image analysis, and content moderation.