arxiv_cv 95% Match Research Paper AI researchers,Generative model developers,Digital artists,Content creators 2 weeks ago

Chimera: Compositional Image Generation using Part-based Concepting

generative-ai › diffusion

📄 Abstract

Abstract: Personalized image generative models are highly proficient at synthesizing images from text or a single image, yet they lack explicit control for composing objects from specific parts of multiple source images without user specified masks or annotations. To address this, we introduce Chimera, a personalized image generation model that generates novel objects by combining specified parts from different source images according to textual instructions. To train our model, we first construct a dataset from a taxonomy built on 464 unique (part, subject) pairs, which we term semantic atoms. From this, we generate 37k prompts and synthesize the corresponding images with a high-fidelity text-to-image model. We train a custom diffusion prior model with part-conditional guidance, which steers the image-conditioning features to enforce both semantic identity and spatial layout. We also introduce an objective metric PartEval to assess the fidelity and compositional accuracy of generation pipelines. Human evaluations and our proposed metric show that Chimera outperforms other baselines by 14% in part alignment and compositional accuracy and 21% in visual quality.

Authors (7)

Shivam Singh

Yiming Chen

Agneet Chatterjee

Amit Raj

James Hays

Yezhou Yang

+1 more

Submitted

October 20, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Chimera introduces a personalized image generation model that composes novel objects by combining specified parts from different source images based on textual instructions. It utilizes a custom diffusion prior with part-conditional guidance and introduces the PartEval metric for evaluating compositional accuracy.

Business Value

Enables users to create highly customized and complex images by combining elements from various sources, revolutionizing digital art, design, and personalized content creation.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

Moderate, requires significant computational resources for diffusion models.

Limitations Addressed

Lack of explicit control for composing objects from specific parts of multiple source images,Need for user-specified masks or annotations in existing models

Technical Tags

compositional image generationpart-based conceptingpersonalized image modelsdiffusion modelssemantic atomspart-conditional guidancetext-to-image synthesisobject compositionfidelity metricPartEval

Research Topics

Generative ModelsImage SynthesisCompositional GenerationPersonalized AIDiffusion Models

Methods & Architectures

Diffusion Prior ModelPart-Conditional GuidanceSemantic Atom TaxonomyPartEval Metric Diffusion ModelsCustom Diffusion Prior

Applications & Tasks

Digital Art Content Creation Image Editing Virtual Worlds Compositional Image GenerationControllable Image SynthesisPersonalized Image Creation Generating novel objects by combining partsImage synthesis based on textual instructionsControlling object composition

Related Fields

Generative AIComputer VisionDiffusion ModelsImage SynthesisMachine Learning

Keywords

Image GenerationDiffusion ModelsCompositional AIPart-based GenerationPersonalized ImagesText-to-ImageSemantic AtomsContent CreationDigital ArtPartEvalConditional Generation

Academic Context

#Generative Models#Image Synthesis#Compositional Generation#Personalized AI#Diffusion Models

Commercial Potential

Potential Products

Advanced image editing softwarePersonalized content generation platformsTools for virtual world asset creation

Target Industries

Media and EntertainmentAdvertisingGamingDesign

Use Case Examples

Creating unique character designs by combining features from different imagesGenerating custom product mockups with specific components

Competitive Edge

Offers more fine-grained control over image composition compared to standard text-to-image models.

Market Opportunity

Rapidly growing market for generative AI and creative tools.

Revenue Models

Software licensingAPI accesspremium features.

Resource Requirements

Compute Needs

High, typical for diffusion models.

Data Requirements

Requires a dataset of images with part annotations or a method to derive them (semantic atoms).

Deployment Constraints

Computational cost and potential for artifacts in composition.

Scalability

Scalability depends on the efficiency of the diffusion model and the complexity of composition.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years

Patent Potential

Moderate, related to the part-conditional guidance and PartEval metric.

View Full Paper Back to Papers