Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Subject-driven image generation aims to synthesize novel scenes that
faithfully preserve subject identity from reference images while adhering to
textual guidance. However, existing methods struggle with a critical trade-off
between fidelity and efficiency. Tuning-based approaches rely on time-consuming
and resource-intensive, subject-specific optimization, while zero-shot methods
often fail to maintain adequate subject consistency. In this work, we propose
FreeGraftor, a training-free framework that addresses these limitations through
cross-image feature grafting. Specifically, FreeGraftor leverages semantic
matching and position-constrained attention fusion to transfer visual details
from reference subjects to the generated images. Additionally, our framework
introduces a novel noise initialization strategy to preserve the geometry
priors of reference subjects, facilitating robust feature matching. Extensive
qualitative and quantitative experiments demonstrate that our method enables
precise subject identity transfer while maintaining text-aligned scene
synthesis. Without requiring model fine-tuning or additional training,
FreeGraftor significantly outperforms existing zero-shot and training-free
approaches in both subject fidelity and text alignment. Furthermore, our
framework can seamlessly extend to multi-subject generation, making it
practical for real-world deployment. Our code is available at
https://github.com/Nihukat/FreeGraftor.
Authors (7)
Zebin Yao
Lei Ren
Huixing Jiang
Chen Wei
Xiaojie Wang
Ruifan Li
+1 more
Key Contributions
FreeGraftor introduces a training-free framework for subject-driven text-to-image generation that overcomes the fidelity-efficiency trade-off. It achieves this by leveraging cross-image feature grafting, semantic matching, and position-constrained attention fusion to transfer visual details and a novel noise initialization strategy to preserve geometry priors, leading to improved subject consistency and efficiency.
Business Value
Enables faster and more efficient creation of personalized images for marketing, design, and entertainment, reducing the need for extensive manual editing or computationally expensive training.