Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper Researchers in generative AI,Developers of text-to-image models,Artists using AI tools 1 week ago

Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models

generative-ai › diffusion
📄 Abstract

Abstract: Text-to-image diffusion models rely on text embeddings from a pre-trained text encoder, but these embeddings remain fixed across all diffusion timesteps, limiting their adaptability to the generative process. We propose Diffusion Adaptive Text Embedding (DATE), which dynamically updates text embeddings at each diffusion timestep based on intermediate perturbed data. We formulate an optimization problem and derive an update rule that refines the text embeddings at each sampling step to improve alignment and preference between the mean predicted image and the text. This allows DATE to dynamically adapts the text conditions to the reverse-diffused images throughout diffusion sampling without requiring additional model training. Through theoretical analysis and empirical results, we show that DATE maintains the generative capability of the model while providing superior text-image alignment over fixed text embeddings across various tasks, including multi-concept generation and text-guided image editing. Our code is available at https://github.com/aailab-kaist/DATE.
Authors (9)
Byeonghu Na
Minsang Park
Gyuwon Sim
Donghyeok Shin
HeeSun Bae
Mina Kang
+3 more
Submitted
October 28, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Proposes Diffusion Adaptive Text Embedding (DATE), a training-free method that dynamically updates text embeddings at each diffusion timestep. By formulating an optimization problem to refine embeddings based on intermediate data, DATE improves text-image alignment and generation quality without requiring additional model training, enhancing control over the generative process.

Business Value

Enables the creation of more accurate and controllable text-to-image generation tools, leading to higher quality artistic outputs and more precise visual content creation for marketing, design, and entertainment.