arxiv_cv 98% Match Research Paper 3D Artists,Game Developers,VR/AR Engineers,AI Researchers,Metaverse Developers 2 weeks ago

Capture, Canonicalize, Splat: Zero-Shot 3D Gaussian Avatars from Unstructured Phone Images

computer-vision › 3d-vision

📄 Abstract

Abstract: We present a novel, zero-shot pipeline for creating hyperrealistic, identity-preserving 3D avatars from a few unstructured phone images. Existing methods face several challenges: single-view approaches suffer from geometric inconsistencies and hallucinations, degrading identity preservation, while models trained on synthetic data fail to capture high-frequency details like skin wrinkles and fine hair, limiting realism. Our method introduces two key contributions: (1) a generative canonicalization module that processes multiple unstructured views into a standardized, consistent representation, and (2) a transformer-based model trained on a new, large-scale dataset of high-fidelity Gaussian splatting avatars derived from dome captures of real people. This "Capture, Canonicalize, Splat" pipeline produces static quarter-body avatars with compelling realism and robust identity preservation from unstructured photos.

Authors (17)

Emanuel Garbin

Guy Adam

Oded Krams

Zohar Barzelay

Eran Guendelman

Michael Schwarz

+11 more

Submitted

October 15, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Introduces a novel, zero-shot pipeline for creating hyperrealistic, identity-preserving 3D avatars from unstructured phone images. It features a generative canonicalization module for consistent view processing and a transformer model trained on a new large-scale dataset of Gaussian splatting avatars, overcoming limitations of single-view methods and synthetic data training.

Business Value

Enables the creation of highly realistic and personalized 3D avatars from readily available phone images, significantly lowering the barrier to entry for applications in the metaverse, gaming, and virtual communication.

Paper Metadata

Innovation Type

Pipeline/Framework

Deployment Feasibility

Moderate to High. Requires a few phone images as input. The output is a static avatar, simplifying deployment compared to dynamic avatars.

Limitations Addressed

Geometric inconsistencies and hallucinations in single-view reconstruction.,Failure of synthetic data models to capture high-frequency details.,Need for structured input data.

Technical Tags

3D avatar generationzero-shot learningGaussian splattingidentity preservationunstructured imagesgenerative canonicalizationtransformer modelshigh-fidelity avatarsskin wrinklesfine hair

Research Topics

3D ReconstructionGenerative ModelsComputer VisionAvatar CreationZero-Shot Learning

Methods & Architectures

Generative canonicalization moduleTransformer-based modelGaussian splattingzero-shot pipeline TransformerGaussian Splatting

Applications & Tasks

Virtual Reality (VR) Augmented Reality (AR) Metaverse Gaming Digital Humans Telepresence Generating hyperrealistic 3D avatars from limited, unstructured images.Maintaining identity consistency.Capturing high-frequency details (skin, hair). 3D Avatar GenerationIdentity-Preserving ReconstructionRendering Realistic Avatars

Datasets & Benchmarks

Datasets

large-scale dataset of high-fidelity Gaussian splatting avatars

realismidentity preservationgeometric consistency

Related Fields

Computer Graphics3D Computer VisionGenerative AIMachine LearningVirtual RealityAugmented Reality

Keywords

3D avatargaussian splattingzero-shotidentity preservationunstructured imagesphone imagescanonicalizationtransformerrealismmetaversevirtual realityaugmented realitydigital human

Academic Context

#3D Reconstruction#Generative Models#Computer Vision#Avatar Creation#Zero-Shot Learning

Technology Stack

Frameworks & Libraries

Gaussian Splatting

Commercial Potential

Potential Products

Avatar creation platformPersonalized 3D model generatorSDK for game/VR/AR development

Target Industries

GamingMetaverseSocial MediaVirtual RealityAugmented RealityE-commerce (virtual try-on)

Use Case Examples

Creating a personalized avatar for a VR social platform from a selfie.Generating realistic digital doubles for virtual actors.Enabling users to create their own avatars for games.

Competitive Edge

Offers a zero-shot approach that works with unstructured phone images, outperforming methods that require structured inputs or struggle with identity and detail fidelity.

Resource Requirements

Compute Needs

High, especially for training the transformer model and generating Gaussian splatting representations.

Data Requirements

Requires a large-scale dataset of high-fidelity Gaussian splatting avatars derived from real people.

Deployment Constraints

Output is static avatars; dynamic avatar generation is not addressed. Requires multiple input images for best results.

Scalability

The pipeline is designed for efficiency in processing multiple views into a standardized representation, potentially scalable for batch processing.

View Full Paper Back to Papers