Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper AI researchers,Generative model developers,Digital artists,Content creators 2 weeks ago

Chimera: Compositional Image Generation using Part-based Concepting

generative-ai › diffusion
📄 Abstract

Abstract: Personalized image generative models are highly proficient at synthesizing images from text or a single image, yet they lack explicit control for composing objects from specific parts of multiple source images without user specified masks or annotations. To address this, we introduce Chimera, a personalized image generation model that generates novel objects by combining specified parts from different source images according to textual instructions. To train our model, we first construct a dataset from a taxonomy built on 464 unique (part, subject) pairs, which we term semantic atoms. From this, we generate 37k prompts and synthesize the corresponding images with a high-fidelity text-to-image model. We train a custom diffusion prior model with part-conditional guidance, which steers the image-conditioning features to enforce both semantic identity and spatial layout. We also introduce an objective metric PartEval to assess the fidelity and compositional accuracy of generation pipelines. Human evaluations and our proposed metric show that Chimera outperforms other baselines by 14% in part alignment and compositional accuracy and 21% in visual quality.
Authors (7)
Shivam Singh
Yiming Chen
Agneet Chatterjee
Amit Raj
James Hays
Yezhou Yang
+1 more
Submitted
October 20, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Chimera introduces a personalized image generation model that composes novel objects by combining specified parts from different source images based on textual instructions. It utilizes a custom diffusion prior with part-conditional guidance and introduces the PartEval metric for evaluating compositional accuracy.

Business Value

Enables users to create highly customized and complex images by combining elements from various sources, revolutionizing digital art, design, and personalized content creation.