arxiv_cv 95% Match Research Paper Computer Vision Researchers,3D Graphics Engineers,Robotics Engineers,AR/VR Developers 2 weeks ago

SaLon3R: Structure-aware Long-term Generalizable 3D Reconstruction from Unposed Images

computer-vision › 3d-vision

📄 Abstract

Abstract: Recent advances in 3D Gaussian Splatting (3DGS) have enabled generalizable, on-the-fly reconstruction of sequential input views. However, existing methods often predict per-pixel Gaussians and combine Gaussians from all views as the scene representation, leading to substantial redundancies and geometric inconsistencies in long-duration video sequences. To address this, we propose SaLon3R, a novel framework for Structure-aware, Long-term 3DGS Reconstruction. To our best knowledge, SaLon3R is the first online generalizable GS method capable of reconstructing over 50 views in over 10 FPS, with 50% to 90% redundancy removal. Our method introduces compact anchor primitives to eliminate redundancy through differentiable saliency-aware Gaussian quantization, coupled with a 3D Point Transformer that refines anchor attributes and saliency to resolve cross-frame geometric and photometric inconsistencies. Specifically, we first leverage a 3D reconstruction backbone to predict dense per-pixel Gaussians and a saliency map encoding regional geometric complexity. Redundant Gaussians are compressed into compact anchors by prioritizing high-complexity regions. The 3D Point Transformer then learns spatial structural priors in 3D space from training data to refine anchor attributes and saliency, enabling regionally adaptive Gaussian decoding for geometric fidelity. Without known camera parameters or test-time optimization, our approach effectively resolves artifacts and prunes the redundant 3DGS in a single feed-forward pass. Experiments on multiple datasets demonstrate our state-of-the-art performance on both novel view synthesis and depth estimation, demonstrating superior efficiency, robustness, and generalization ability for long-term generalizable 3D reconstruction. Project Page: https://wrld.github.io/SaLon3R/.

Authors (8)

Jiaxin Guo

Tongfan Guan

Wenzhen Dong

Wenzhao Zheng

Wenting Wang

Yue Wang

+2 more

Submitted

October 16, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

SaLon3R introduces a novel framework for Structure-aware, Long-term 3D Gaussian Splatting (3DGS) reconstruction that significantly reduces redundancy and improves geometric/photometric consistency over long video sequences. It achieves this through saliency-aware Gaussian quantization and a 3D Point Transformer, enabling high-FPS online reconstruction of over 50 views.

Business Value

Enables more efficient and consistent creation of detailed 3D environments from video, which is crucial for immersive applications like VR/AR, virtual production, and autonomous driving simulation.

Paper Metadata

Innovation Type

Algorithmic Improvement

Deployment Feasibility

High, as it focuses on improving existing efficient methods (3DGS) for real-time applications.

Limitations Addressed

Substantial redundancies and geometric/photometric inconsistencies in existing 3DGS methods when applied to long-duration video sequences, and the inability of previous methods to perform online generalizable reconstruction at high frame rates.

Performance Gains

50% to 90% redundancy removal compared to prior methods.

Technical Tags

3D Gaussian Splattinggeneralizable reconstructionlong-term videoredundancy removalsaliency-aware quantization3D Point Transformeronline reconstructiongeometric consistencyphotometric consistency

Research Topics

3D Computer VisionNeRF and Implicit RepresentationsVideo ProcessingGenerative ModelsScene Reconstruction

Methods & Architectures

3D Gaussian SplattingTransformer NetworksDifferentiable QuantizationOnline Learning 3D Gaussian Splatting3D Point Transformer

Applications & Tasks

Virtual Reality Augmented Reality Robotics Autonomous Driving Film and Animation 3D ReconstructionScene SynthesisVideo UnderstandingData Compression 3D scene reconstruction from videoReal-time renderingLong-term video processing

Datasets & Benchmarks

Benchmarks

50% to 90% redundancy removal • over 50 views • over 10 FPS

Reconstruction qualityTemporal consistencyRedundancy reductionFPS

Related Fields

Computer Vision3D GraphicsMachine LearningRoboticsVirtual Reality

Keywords

3D reconstructionGaussian Splattingvideo generationlong-term consistencyredundancy reduction3D Point Transformeronline reconstructiongeometric consistencyphotometric consistencyreal-time rendering

Academic Context

#3D Computer Vision#NeRF and Implicit Representations#Video Processing#Generative Models#Scene Reconstruction

Commercial Potential

Potential Products

Real-time 3D environment generatorsAR/VR content creation toolsSimulation platforms for autonomous systems

Target Industries

GamingFilm and EntertainmentAutomotiveRoboticsArchitecture

Use Case Examples

Generating interactive 3D environments from drone footageCreating realistic virtual sets for film productionBuilding high-fidelity simulations for autonomous vehicle testing

Competitive Edge

Improves upon existing 3D Gaussian Splatting methods by addressing long-term consistency and redundancy for video sequences, enabling higher FPS and more efficient reconstruction.

Market Opportunity

Growing markets for AR/VR, simulation, and virtual production.

Revenue Models

Licensing of technologySaaS for content creation platforms.

Resource Requirements

Compute Needs

Likely significant GPU resources for training and real-time inference, but optimized for efficiency.

Data Requirements

Sequential unposed images (video)

Deployment Constraints

Requires sufficient computational power for real-time processing.

Scalability

Designed for long-term video sequences and high FPS, indicating good scalability in temporal dimension.

Production Readiness

Maturity Level

Research Prototype

Time to Market

1-3 years

View Full Paper Back to Papers