arxiv_ai 90% Match Research Paper 3D Artists,Game Developers,AR/VR Developers,AI Researchers,Content Creators 3 weeks ago

MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion

generative-ai › diffusion

📄 Abstract

Abstract: Multi-view generation with camera pose control and prompt-based customization are both essential elements for achieving controllable generative models. However, existing multi-view generation models do not support customization with geometric consistency, whereas customization models lack explicit viewpoint control, making them challenging to unify. Motivated by these gaps, we introduce a novel task, multi-view customization, which aims to jointly achieve multi-view camera pose control and customization. Due to the scarcity of training data in customization, existing multi-view generation models, which inherently rely on large-scale datasets, struggle to generalize to diverse prompts. To address this, we propose MVCustom, a novel diffusion-based framework explicitly designed to achieve both multi-view consistency and customization fidelity. In the training stage, MVCustom learns the subject's identity and geometry using a feature-field representation, incorporating the text-to-video diffusion backbone enhanced with dense spatio-temporal attention, which leverages temporal coherence for multi-view consistency. In the inference stage, we introduce two novel techniques: depth-aware feature rendering explicitly enforces geometric consistency, and consistent-aware latent completion ensures accurate perspective alignment of the customized subject and surrounding backgrounds. Extensive experiments demonstrate that MVCustom is the only framework that simultaneously achieves faithful multi-view generation and customization.

Authors (5)

Minjung Shin

Hyunin Cho

Sooyeon Go

Jin-Hwa Kim

Youngjung Uh

Submitted

October 15, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

This paper introduces MVCustom, a diffusion-based framework for multi-view customization that jointly achieves camera pose control and prompt-based customization with geometric consistency. It learns the subject's identity and geometry using a feature-field representation during training, enabling generalization to diverse prompts despite data scarcity.

Business Value

Enables the creation of highly controllable and customizable 3D assets and scenes, revolutionizing fields like virtual try-on, product visualization, game development, and metaverse content creation by allowing users to generate specific views of customized objects.

Paper Metadata

Innovation Type

Novel Framework and Training Methodology

Deployment Feasibility

Moderate. Requires significant computational resources and specialized data, but the framework offers a unified solution.

Limitations Addressed

The inability of existing multi-view generation models to support customization with geometric consistency, and the lack of explicit viewpoint control in customization models, hindering their unification.

Performance Gains

Joint control of view and customization,Improved geometric consistency,Better generalization to diverse prompts

Technical Tags

Multi-view GenerationDiffusion ModelsCustomizationCamera Pose ControlGeometric ConsistencyLatent RenderingFeature FieldsControllable Generation

Research Topics

Generative ModelsComputer Vision3D VisionDiffusion ModelsControllable AI

Methods & Architectures

MVCustom FrameworkGeometric Latent RenderingFeature-Field RepresentationDiffusion-based GenerationMulti-view Consistency Learning Diffusion Models

Applications & Tasks

3D Content Creation Virtual Reality Augmented Reality E-commerce Generative AI Lack of Geometric Consistency in CustomizationDifficulty Unifying Viewpoint Control and CustomizationPoor Generalization of Multi-view Models to Diverse PromptsScarcity of Training Data for Customization Generating consistent multi-view imagesCustomizing generated content with text promptsControlling camera poses for generated views

Related Fields

Computer Vision3D GraphicsGenerative ModelsDeep LearningDiffusion Models

Keywords

Multi-view GenerationDiffusion ModelsCustomizationCamera PoseGeometric Consistency3D GenerationControllable AILatent RenderingFeature FieldsMVCustomGenerative AIPrompt-based Generation

Academic Context

#Generative Models#Computer Vision#3D Vision#Diffusion Models#Controllable AI

Technology Stack

Frameworks & Libraries

MVCustom

Commercial Potential

Potential Products

3D asset generation platformsCustomizable virtual environment toolsAI-powered product configurators

Target Industries

E-commerceGamingVirtual RealityAugmented RealityDesign & Manufacturing

Use Case Examples

Generating multiple views of a customized product for online catalogsCreating personalized 3D avatars with specific posesDesigning virtual scenes based on textual descriptions and camera angles

Competitive Edge

Provides a unified framework (MVCustom) that effectively combines multi-view generation with prompt-based customization and explicit camera pose control, addressing key limitations in existing controllable generative models.

Market Opportunity

Rapid growth of the 3D content market,Increasing demand for personalized and controllable generative AI

Revenue Models

Licensing of MVCustom technologySaaS for 3D content generation

Resource Requirements

Data Requirements

Multi-view image datasets,Text-image pairs

Deployment Constraints

High computational cost for training and inference.,Requires careful alignment of text prompts with geometric properties.

Scalability

Aims to scale controllable multi-view generation capabilities.

Production Readiness

Maturity Level

Research Prototype

Time to Market

2-3 years

Patent Potential

Moderate, for the feature-field representation and geometric latent rendering techniques.

View Full Paper Back to Papers