arxiv_cv 90% Match Research Paper Computer vision researchers,Robotics engineers,AR/VR developers,3D artists 2 weeks ago

FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

computer-vision › 3d-vision

📄 Abstract

Abstract: We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images (i.e., as few as 2-8 inputs), which is a challenging yet practical setting in real-world applications. Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes. Concretely, FLARE starts with camera pose estimation, whose results condition the subsequent learning of geometric structure and appearance, optimized through the objectives of geometry reconstruction and novel-view synthesis. Utilizing large-scale public datasets for training, our method delivers state-of-the-art performance in the tasks of pose estimation, geometry reconstruction, and novel view synthesis, while maintaining the inference efficiency (i.e., less than 0.5 seconds). The project page and code can be found at: https://zhanghe3z.github.io/FLARE/

Authors (8)

Shangzhan Zhang

Jianyuan Wang

Yinghao Xu

Nan Xue

Christian Rupprecht

Xiaowei Zhou

+2 more

Submitted

February 17, 2025

arXiv Category

cs.CV

arXiv PDF Code

Key Contributions

FLARE is a feed-forward model that infers high-quality camera poses and 3D geometry from uncalibrated sparse views (2-8 images). It employs a cascaded learning paradigm where camera pose estimation guides subsequent geometry and appearance learning, achieving state-of-the-art performance in pose estimation, reconstruction, and novel view synthesis with efficient inference.

Business Value

Enables more accessible and efficient 3D scene understanding and reconstruction, crucial for AR/VR applications, robotics, and digital content creation.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

High, due to its feed-forward nature and fast inference (<0.5 seconds).

Limitations Addressed

Difficulty of 3D reconstruction from sparse, uncalibrated views,Need for efficient inference in real-world applications

Performance Gains

State-of-the-art performance in pose estimation, geometry reconstruction, and novel view synthesis.

View Code on GitHub

Technical Tags

3D reconstructioncamera pose estimationgeometry estimationappearance estimationsparse viewsuncalibrated camerasfeed-forward networknovel view synthesisstructure from motion

Research Topics

3D Computer VisionGeometric Deep LearningStructure from MotionScene ReconstructionDeep Learning for Geometry

Methods & Architectures

FLARE frameworkCascaded learning paradigmFeed-forward modelCamera pose estimationGeometry reconstructionNovel-view synthesis FLARE

Applications & Tasks

Augmented Reality (AR) Virtual Reality (VR) Robotics 3D Modeling Autonomous Driving Inferring 3D geometry and camera poses from sparse, uncalibrated viewsHandling challenging real-world scenarios with limited input Camera pose estimation3D geometry reconstructionAppearance estimationNovel view synthesis

Datasets & Benchmarks

Datasets

large-scale public datasets

Related Fields

Computer Vision3D GeometryRoboticsAugmented RealityVirtual Reality

Keywords

3D reconstructioncamera posegeometrysparse viewsuncalibratedfeed-forwardFLAREnovel view synthesisstructure from motiondeep learningcomputer visionAR/VR

Academic Context

#3D Computer Vision#Geometric Deep Learning#Structure from Motion#Scene Reconstruction#Deep Learning for Geometry

Commercial Potential

Potential Products

3D scanning appsAR/VR content creation toolsRobotic navigation systems3D modeling software

Target Industries

AR/VRGamingRoboticsArchitectureManufacturing

Use Case Examples

Creating 3D models of real-world objects from a few photosEnabling robots to understand their 3D environmentGenerating realistic virtual environments for AR/VR

Competitive Edge

Outperforms existing methods in efficiency and accuracy for sparse, uncalibrated view scenarios, offering a practical solution where traditional SfM/MVS methods struggle.

Market Opportunity

Large and growing markets for AR/VR, robotics, and 3D content creation.

Revenue Models

Licensing the technologyoffering it as a cloud APIor integrating into existing software products.

Resource Requirements

Compute Needs

Moderate, suitable for real-time applications.

Data Requirements

Large-scale datasets with ground truth camera poses and 3D geometry.

Deployment Constraints

Performance may degrade with extreme lighting changes or lack of texture.

Scalability

Inference time is constant regardless of scene complexity, making it scalable.

Production Readiness

Maturity Level

Research

Time to Market

1-2 years for integration into AR/VR platforms or robotics.

Licensing

Likely permissive (e.g., MIT) given code availability.

Patent Potential

Moderate, for the cascaded learning paradigm and specific network components.

View Full Paper Back to Papers