Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper AI researchers,Computer vision scientists,NLP researchers,Robotics engineers,AR/VR developers 2 weeks ago

Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes

computer-vision › scene-understanding
📄 Abstract

Abstract: Existing research on 3D Large Language Models (LLMs) still struggles to achieve grounded question-answering, primarily due to the under-exploration of the mechanism of human-like scene-object grounded reasoning. This paper bridges the gap by presenting a novel framework. We first introduce a grounded Chain-of-Thought reasoning method in 3D scenes (SCENECOT), decoupling a complex reasoning task into simpler and manageable problems, and building corresponding visual clues based on multimodal expert modules. To enable such a method, we develop SCENECOT-185K, the first large-scale grounded CoT reasoning dataset, consisting of 185K high-quality instances. Extensive experiments across various complex 3D scene reasoning benchmarks demonstrate that our new framework achieves strong performance with high grounding-QA coherence. To the best of our knowledge, this is the first successful application of CoT reasoning to 3D scene understanding, enabling step-by-step human-like reasoning and showing potential for extension to broader 3D scene understanding scenarios.
Authors (5)
Xiongkun Linghu
Jiangyong Huang
Ziyu Zhu
Baoxiong Jia
Siyuan Huang
Submitted
October 19, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Presents a novel framework for grounded Chain-of-Thought (CoT) reasoning in 3D scenes, decoupling complex tasks and using multimodal expert modules to generate visual clues. Introduces SCENECOT-185K, the first large-scale dataset for grounded CoT reasoning in 3D, achieving strong performance and high grounding-QA coherence.

Business Value

Enables AI systems to understand and reason about complex 3D environments more effectively, crucial for advanced robotics, AR/VR applications, and intelligent spatial assistants.