arxiv_cv 85% Match Research Paper 3D Artists,Game Developers,VR/AR Developers,AI Researchers in Graphics,Computer Vision Engineers 1 month ago

DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision

generative-ai › diffusion

📄 Abstract

Abstract: While text-to-3D generation has attracted growing interest, existing methods often struggle to produce 3D assets that align well with human preferences. Current preference alignment techniques for 3D content typically rely on hardly-collected preference-paired multi-view 2D images to train 2D reward models, when then guide 3D generation -- leading to geometric artifacts due to their inherent 2D bias. To address these limitations, we construct 3D-MeshPref, the first large-scale unpaired 3D preference dataset, featuring diverse 3D meshes annotated by a large language model and refined by human evaluators. We then develop RewardCS, the first reward model trained directly on unpaired 3D-MeshPref data using a novel Cauchy-Schwarz divergence objective, enabling effective learning of human-aligned 3D geometric preferences without requiring paired comparisons. Building on this, we propose DreamCS, a unified framework that integrates RewardCS into text-to-3D pipelines -- enhancing both implicit and explicit 3D generation with human preference feedback. Extensive experiments show DreamCS outperforms prior methods, producing 3D assets that are both geometrically faithful and human-preferred. Code and models will be released publicly.

Key Contributions

Introduces DreamCS, a text-to-3D generation framework that uses RewardCS, a novel reward model trained on unpaired 3D data (3D-MeshPref dataset) using a Cauchy-Schwarz divergence objective. This enables learning human-aligned 3D geometric preferences without paired comparisons, leading to improved 3D asset quality and reduced geometric artifacts.

Business Value

Accelerates 3D content creation for industries like gaming, VR/AR, and product design by enabling users to generate high-quality, preference-aligned 3D assets from text descriptions, reducing manual effort and cost.

Paper Metadata

Innovation Type

Novel Framework, Reward Model & Dataset

Deployment Feasibility

Moderate. Requires integration into existing 3D generation pipelines. The training of the reward model and generation process can be computationally intensive.

Limitations Addressed

Existing text-to-3D methods often produce 3D assets with geometric artifacts due to reliance on 2D preference models or difficulty in collecting paired 3D preference data.

Performance Gains

Implied improvements in the quality and human-alignment of generated 3D assets compared to previous methods.

Technical Tags

text-to-3D3D generationgeometry-awarereward supervisionunpaired 3D datapreference alignment3D meshesCauchy-Schwarz divergenceDreamCSRewardCS

Research Topics

3D Content GenerationGenerative ModelsComputer GraphicsHuman-AI InteractionMachine Learning for Design

Methods & Architectures

DreamCS frameworkRewardCS reward model3D-MeshPref datasetCauchy-Schwarz divergence objectiveText-to-3D pipelinesUnpaired 3D reward supervision Diffusion ModelsLarge Language Models (LLMs)

Applications & Tasks

3D Modeling Game Development Virtual Reality (VR) Augmented Reality (AR) Computer Graphics Generating 3D assets that align with human preferencesGeometric artifacts in 3D generationDifficulty in collecting paired 3D preference data Text-to-3D generationLearning human-aligned 3D geometric preferencesImproving 3D asset quality

Datasets & Benchmarks

Datasets

3D-MeshPref dataset

Alignment with human preferencesGeometric quality of 3D meshesReduction in artifacts

Related Fields

3D Computer VisionGenerative Adversarial Networks (GANs)Deep LearningComputer GraphicsHuman-Computer Interaction

Keywords

text-to-3D3D generationgenerative AIdiffusion modelsreward modelpreference learning3D meshesgeometry-awareunpaired dataCauchy-SchwarzDreamCSRewardCS

Academic Context

#3D Content Generation#Generative Models#Computer Graphics#Human-AI Interaction#Machine Learning for Design

Commercial Potential

Potential Products

Text-to-3D asset generation toolsAI-powered 3D modeling software pluginsPlatforms for generating VR/AR environments

Target Industries

GamingVirtual RealityAugmented Reality3D AnimationProduct DesignMetaverse

Use Case Examples

Generating 3D character models from descriptionsCreating 3D environments for virtual worldsDesigning 3D product prototypes based on text specifications

Competitive Edge

Addresses the critical challenge of aligning 3D generation with human preferences by developing a novel reward model trained on unpaired 3D data, overcoming limitations of 2D-based or paired-data approaches.

Resource Requirements

Compute Needs

Likely high, especially for training the reward model and for the text-to-3D generation process itself, which often involves diffusion models.

Data Requirements

Introduces the 3D-MeshPref dataset, specifically designed for training reward models on unpaired 3D data.

Deployment Constraints

The complexity of 3D generation and the computational cost can be barriers to widespread real-time deployment. Ensuring geometric consistency and detail can be challenging.

Scalability

Scalability depends on the efficiency of the diffusion process and the reward model inference. Handling diverse and complex 3D assets is a key challenge.

View Full Paper Back to Papers