arxiv_cv 92% Match Research Paper 3D computer vision researchers,Machine learning engineers,Robotics researchers,Game developers 2 weeks ago

Diffusion Models are Efficient Data Generators for Human Mesh Recovery

computer-vision › diffusion-models

📄 Abstract

Abstract: Despite remarkable progress having been made on the problem of 3D human pose and shape estimation (HPS), current state-of-the-art methods rely heavily on either confined indoor mocap datasets or datasets generated by a rendering engine using computer graphics (CG). Both categories of datasets exhibit inadequacies in furnishing adequate human identities and authentic in-the-wild background scenes, which are crucial for accurately simulating real-world distributions. In this work, we show that synthetic data created by generative models is complementary to CG-rendered data for achieving remarkable generalization performance on diverse real-world scenes. We propose an effective data generation pipeline based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations. Specifically, we first collect a large-scale human-centric dataset with comprehensive annotations, e.g, text captions, the depth map, and surface normal images. To generate a wide variety of human images with initial labels, we train a customized, multi-condition ControlNet model. The key to this process is using a 3D parametric model, e.g, SMPL-X, to create various condition inputs easily. Our data generation pipeline is both flexible and customizable, making it adaptable to multiple real-world tasks, such as human interaction in complex scenes and humans captured by wide-angle lenses. By relying solely on generative models, we can produce large-scale, in-the-wild human images with high-quality annotations, significantly reducing the need for manual image collection and annotation. The generated dataset encompasses a wide range of viewpoints, environments, and human identities, ensuring its versatility across different scenarios. We hope that our work could pave the way for scaling up 3D human recovery to in-the-wild scenes.

Authors (7)

Yongtao Ge

Wenjia Wang

Yongfan Chen

Fanzhou Wang

Lei Yang

Hao Chen

+1 more

Submitted

March 17, 2024

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Demonstrates that diffusion models can efficiently generate high-quality synthetic data (HumanWild) that complements CG-rendered data for human mesh recovery. This synthetic data improves the generalization performance of 3D HPS models on diverse real-world scenes by providing realistic human identities and backgrounds.

Business Value

Enables the development of more robust and accurate 3D human understanding systems for applications like animation, gaming, robotics, and virtual try-on, by providing high-quality, diverse training data.

Paper Metadata

Innovation Type

Data Generation / Algorithmic

Deployment Feasibility

High, as it provides a method to generate necessary training data, which is often a bottleneck.

Limitations Addressed

Inadequacies of existing datasets (mocap, CG-rendered) for 3D HPS,Poor generalization of models trained on limited datasets,Lack of authentic in-the-wild scenes and diverse human identities

Performance Gains

Remarkable generalization performance on diverse real-world scenes for 3D human pose and shape estimation models trained with the generated data.

Technical Tags

diffusion modelsdata generationhuman mesh recovery3D human pose and shape estimationsynthetic datageneralizationrendering enginemocap datasets

Research Topics

Generative Models3D Computer VisionHuman Pose EstimationSynthetic Data GenerationDeep Learning

Methods & Architectures

Diffusion Models for data generationHumanWild dataset generation pipeline Diffusion Models

Applications & Tasks

3D Computer Vision Human Motion Capture Computer Graphics Robotics Virtual Reality Inadequacy of existing datasetsPoor generalization to real-world scenesLack of diverse human identities and backgroundsReliance on CG-rendered or confined indoor data Generating realistic synthetic data for human mesh recoveryImproving generalization of 3D HPS modelsCreating diverse human images with 3D annotations

Datasets & Benchmarks

Datasets

HumanWild

Generalization performance on diverse real-world scenes

Related Fields

Generative AIDiffusion Models3D Computer VisionHuman Pose EstimationComputer GraphicsMachine Learning

Keywords

diffusion modelssynthetic data generationhuman mesh recovery3D human pose estimation3D shape estimationgeneralizationcomputer graphicsrenderingmocapHumanWilddeep learninggenerative models

Academic Context

#Generative Models#3D Computer Vision#Human Pose Estimation#Synthetic Data Generation#Deep Learning

Commercial Potential

Potential Products

Synthetic dataset generation servicesImproved 3D human pose/shape estimation models

Target Industries

GamingAnimationVirtual RealityRoboticsFilm ProductionSports Analytics

Use Case Examples

Training models for motion capture from videoDeveloping realistic avatars in virtual environmentsEnabling robots to understand human actions

Competitive Edge

Offers a superior data generation approach compared to traditional CG rendering or limited mocap datasets, leading to better generalization for 3D HPS.

Market Opportunity

Significant market for realistic 3D human data and pose estimation.

Revenue Models

Licensing of generated datasetsservices for custom data generation.

Resource Requirements

Compute Needs

High, for training diffusion models and generating large-scale datasets.

Data Requirements

Large-scale, diverse human images with corresponding 3D mesh annotations.

Deployment Constraints

The generated data needs to be validated for realism and diversity.

Scalability

Diffusion models are generally scalable for data generation.

Production Readiness

Maturity Level

Research/Development

Time to Market

1-3 years for widespread adoption in training pipelines.

View Full Paper Back to Papers