arxiv_cv 95% Match Research Paper Computer graphics researchers,Computer vision engineers,Robotics developers,VR/AR content creators 5 days ago

JOGS: Joint Optimization of Pose Estimation and 3D Gaussian Splatting

computer-vision › 3d-vision

📄 Abstract

Abstract: Traditional novel view synthesis methods heavily rely on external camera pose estimation tools such as COLMAP, which often introduce computational bottlenecks and propagate errors. To address these challenges, we propose a unified framework that jointly optimizes 3D Gaussian points and camera poses without requiring pre-calibrated inputs. Our approach iteratively refines 3D Gaussian parameters and updates camera poses through a novel co-optimization strategy, ensuring simultaneous improvements in scene reconstruction fidelity and pose accuracy. The key innovation lies in decoupling the joint optimization into two interleaved phases: first, updating 3D Gaussian parameters via differentiable rendering with fixed poses, and second, refining camera poses using a customized 3D optical flow algorithm that incorporates geometric and photometric constraints. This formulation progressively reduces projection errors, particularly in challenging scenarios with large viewpoint variations and sparse feature distributions, where traditional methods struggle. Extensive evaluations on multiple datasets demonstrate that our approach significantly outperforms existing COLMAP-free techniques in reconstruction quality, and also surpasses the standard COLMAP-based baseline in general.

Authors (3)

Yuxuan Li

Tao Wang

Xianben Yang

Submitted

October 30, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

This paper proposes JOGS, a unified framework that jointly optimizes 3D Gaussian parameters and camera poses for novel view synthesis, eliminating reliance on external pose estimation tools like COLMAP. It uses a novel co-optimization strategy with interleaved phases for Gaussian updates and pose refinement via 3D optical flow.

Business Value

Enables faster and more accurate creation of 3D assets and virtual environments, crucial for industries like gaming, VR/AR content creation, and digital twins.

Paper Metadata

Innovation Type

Unified Optimization Framework

Deployment Feasibility

Feasible, particularly for applications where real-time or near-real-time 3D scene reconstruction and rendering are required. Requires significant GPU resources.

Limitations Addressed

Computational bottlenecks and error propagation from external pose estimation tools (e.g., COLMAP),Inaccurate camera poses limiting scene reconstruction fidelity,Challenges in handling large viewpoint variations

Performance Gains

Simultaneous improvements in scene reconstruction fidelity and pose accuracy, progressively reducing projection errors, especially in challenging scenarios.

Technical Tags

novel view synthesis3D Gaussian Splattingpose estimationjoint optimizationdifferentiable rendering3D optical flowscene reconstructionphotometric constraintsgeometric constraintsCOLMAP

Research Topics

Computer GraphicsComputer Vision3D ReconstructionNovel View SynthesisMachine LearningRobotics

Methods & Architectures

Joint optimization of 3D Gaussians and camera posesDifferentiable renderingCustomized 3D optical flowIterative refinement 3D Gaussian Splatting

Applications & Tasks

Virtual Reality (VR) Augmented Reality (AR) 3D Modeling Robotics Visual Effects (VFX) Camera Pose Estimation ErrorsComputational BottlenecksScene Reconstruction AccuracyNovel View Synthesis Quality Jointly optimizing 3D scene representation and camera posesGenerating high-fidelity novel views of a scene

Related Fields

Computer GraphicsComputer Vision3D ReconstructionRoboticsVirtual Reality

Keywords

3D Gaussian Splattingnovel view synthesispose estimationjoint optimizationdifferentiable renderingscene reconstruction3Dcomputer graphicscomputer visionCOLMAP alternative

Academic Context

#Computer Graphics#Computer Vision#3D Reconstruction#Novel View Synthesis#Machine Learning#Robotics

Commercial Potential

Potential Products

3D scanning and reconstruction softwareTools for creating virtual environmentsReal-time rendering enginesPlugins for 3D modeling software

Target Industries

GamingVirtual Reality / Augmented RealityFilm and VFXArchitectureRobotics

Use Case Examples

Creating realistic 3D models of real-world scenes from images or videosGenerating immersive virtual environments for training or entertainmentEnabling robots to build accurate 3D maps of their surroundings

Competitive Edge

Offers a significant improvement over traditional methods by integrating pose estimation directly into the 3D representation optimization, eliminating external dependencies and improving accuracy.

Market Opportunity

Large and growing market for 3D content creation, VR/AR, and simulation.

Revenue Models

Software licensingAPI accesscloud rendering services.

Resource Requirements

Compute Needs

High computational requirements, particularly GPU memory and processing power, for training and rendering.

Data Requirements

Requires sets of images or video sequences of a scene, ideally with some initial pose information or structure.

Deployment Constraints

Computational cost for rendering,Need for sufficient input data (images/video)

Production Readiness

Maturity Level

Research

Time to Market

Medium-term, for integration into graphics and robotics pipelines.

View Full Paper Back to Papers