arxiv_cv 98% Match Research Paper 3D Artists,Game Developers,AR/VR Content Creators,Computer Vision Researchers,Machine Learning Engineers 1 day ago

Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image

generative-ai › diffusion

📄 Abstract

Abstract: In this work, we introduce \textbf{Wonder3D++}, a novel method for efficiently generating high-fidelity textured meshes from single-view images. Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry from 2D diffusion priors, but they typically suffer from time-consuming per-shape optimization and inconsistent geometry. In contrast, certain works directly produce 3D information via fast network inferences, but their results are often of low quality and lack geometric details. To holistically improve the quality, consistency, and efficiency of single-view reconstruction tasks, we propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images. To ensure the consistency of generation, we employ a multi-view cross-domain attention mechanism that facilitates information exchange across views and modalities. Lastly, we introduce a cascaded 3D mesh extraction algorithm that drives high-quality surfaces from the multi-view 2D representations in only about $3$ minute in a coarse-to-fine manner. Our extensive evaluations demonstrate that our method achieves high-quality reconstruction results, robust generalization, and good efficiency compared to prior works. Code available at https://github.com/xxlong0/Wonder3D/tree/Wonder3D_Plus.

Authors (10)

Yuxiao Yang

Xiao-Xiao Long

Zhiyang Dou

Cheng Lin

Yuan Liu

Qingsong Yan

+4 more

Submitted

November 3, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Wonder3D++ proposes a cross-domain diffusion model for high-fidelity 3D mesh generation from single images, addressing issues of slow optimization and inconsistent geometry. It utilizes multi-view normal maps, cross-domain attention for consistency, and a cascaded mesh extraction algorithm for efficient and detailed 3D output.

Business Value

Enables rapid and high-quality creation of 3D assets from readily available single images, significantly reducing the cost and time for 3D modeling in industries like gaming, AR/VR, and e-commerce.

Paper Metadata

Innovation Type

Algorithmic Improvement

Deployment Feasibility

Moderate to High. While training diffusion models is resource-intensive, inference can be efficient. The output is a 3D mesh, which is a standard format.

Limitations Addressed

Time-consuming per-shape optimization in SDS methods,Inconsistent geometry in single-view reconstruction,Low quality and lack of geometric details in fast inference methods

Technical Tags

3D GenerationSingle-Image ReconstructionDiffusion ModelsScore Distillation Sampling (SDS)Textured MeshesMulti-view Normal MapsCross-domain AttentionCascaded Mesh ExtractionHigh-fidelityGeometric Consistency

Research Topics

3D ReconstructionGenerative ModelsDiffusion ModelsComputer VisionComputational Geometry

Methods & Architectures

Diffusion ModelsScore Distillation Sampling (SDS)Multi-view Cross-domain AttentionCascaded 3D Mesh ExtractionGenerative Adversarial Networks (GANs) - implied by 'high-fidelity' Diffusion ModelTransformer (implied by attention)

Applications & Tasks

3D Content Creation Virtual Reality Augmented Reality Gaming 3D Modeling Single-view 3D reconstructionTime-consuming optimizationInconsistent geometryLow-quality 3D generation Generating textured 3D meshes from single imagesImproving quality and consistency of 3D generation

Related Fields

Computer VisionGenerative AI3D GraphicsMachine LearningComputational Geometry

Keywords

3D GenerationDiffusion ModelsSingle Image ReconstructionTextured MeshesScore Distillation SamplingMulti-viewCross-domain Attention3D ModelingGenerative AIComputer VisionHigh-fidelityMesh Generation

Academic Context

#3D Reconstruction#Generative Models#Diffusion Models#Computer Vision#Computational Geometry

Commercial Potential

Potential Products

Automated 3D asset generation toolsPlugins for 3D modeling softwareServices for creating 3D models from photos

Target Industries

GamingVirtual RealityAugmented RealityE-commerceFilm and AnimationArchitecture

Use Case Examples

Generating 3D models of products for online catalogsCreating virtual environments from single imagesRapid prototyping of 3D game assets

Competitive Edge

Offers a more efficient and higher-fidelity alternative to existing single-view 3D reconstruction methods, particularly those relying on slow optimization or producing lower-quality results.

Market Opportunity

Significant growth in the metaverse, gaming, and digital content creation markets.

Revenue Models

SaaS for 3D generationlicensing of the modelAPI access for developers.

Resource Requirements

Compute Needs

High (for training diffusion models), Moderate (for inference)

Data Requirements

Large datasets of images paired with corresponding 3D models or multi-view representations.

Deployment Constraints

Performance depends on the quality and viewpoint of the input image; complex or highly occluded objects may be challenging.

Scalability

Diffusion models can be computationally intensive, but architectural improvements and efficient sampling techniques can improve scalability.

Production Readiness

Maturity Level

Research

Time to Market

1-2 years

Patent Potential

Moderate (novel diffusion process and attention mechanisms)

View Full Paper Back to Papers