arxiv_cv 95% Match Research Paper Generative AI Researchers,Graphics Designers,Content Creators 2 weeks ago

Consistent text-to-image generation via scene de-contextualization

generative-ai › diffusion

📄 Abstract

Abstract: Consistent text-to-image (T2I) generation seeks to produce identity-preserving images of the same subject across diverse scenes, yet it often fails due to a phenomenon called identity (ID) shift. Previous methods have tackled this issue, but typically rely on the unrealistic assumption of knowing all target scenes in advance. This paper reveals that a key source of ID shift is the native correlation between subject and scene context, called scene contextualization, which arises naturally as T2I models fit the training distribution of vast natural images. We formally prove the near-universality of this scene-ID correlation and derive theoretical bounds on its strength. On this basis, we propose a novel, efficient, training-free prompt embedding editing approach, called Scene De-Contextualization (SDeC), that imposes an inversion process of T2I's built-in scene contextualization. Specifically, it identifies and suppresses the latent scene-ID correlation within the ID prompt's embedding by quantifying the SVD directional stability to adaptively re-weight the corresponding eigenvalues. Critically, SDeC allows for per-scene use (one scene per prompt) without requiring prior access to all target scenes. This makes it a highly flexible and general solution well-suited to real-world applications where such prior knowledge is often unavailable or varies over time. Experiments demonstrate that SDeC significantly enhances identity preservation while maintaining scene diversity.

Authors (8)

Song Tang

Peihao Gong

Kunyu Li

Kai Guo

Boyu Wang

Mao Ye

+2 more

Submitted

October 16, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

This paper introduces Scene De-Contextualization (SDeC), a novel training-free prompt embedding editing approach to achieve consistent text-to-image generation. It addresses the key source of identity shift, which is the native correlation between subject and scene context, by imposing an inversion process of the model's built-in scene contextualization.

Business Value

Enables the creation of consistent visual assets for marketing, gaming, and virtual reality, where maintaining subject identity across different scenes is crucial.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

High, as it is a training-free prompt editing technique applicable to existing T2I models.

Limitations Addressed

Existing methods for consistent T2I generation often rely on knowing all target scenes in advance and fail due to identity shift caused by subject-scene context correlation. SDeC is training-free and does not require pre-defined scenes.

Technical Tags

text-to-image generationidentity preservationID shiftscene contextualizationprompt embedding editingtraining-freelatent space manipulationgenerative models

Research Topics

Generative ModelsImage SynthesisComputer VisionNatural Language Processing

Methods & Architectures

Scene De-Contextualization (SDeC)Prompt embedding editingInversion process Text-to-Image models

Applications & Tasks

Digital Art Content Creation Virtual Environments Identity ShiftScene ConsistencyPrompt Engineering Consistent text-to-image generationIdentity-preserving image synthesis

Related Fields

Generative AIComputer VisionNatural Language ProcessingDeep Learning

Keywords

text-to-imageconsistencyidentity preservationscene contextprompt editingtraining-freegenerative modelsID shiftlatent spacediffusion models

Academic Context

#Generative Models#Image Synthesis#Computer Vision#Natural Language Processing

Commercial Potential

Potential Products

Tools for consistent character generation in gamesAI-powered design assistants for marketing materials

Target Industries

GamingAdvertisingMedia and Entertainment

Use Case Examples

Generating multiple images of the same character in different settingsCreating consistent visual elements for animated series

Competitive Edge

Provides a training-free solution for identity-preserving T2I generation, overcoming limitations of methods requiring scene pre-definition or extensive retraining.

Market Opportunity

Large and rapidly growing market for generative AI and image synthesis tools.

Revenue Models

Integration into existing generative AI platformsAPI access.

Resource Requirements

Compute Needs

Low during inference, as it's a prompt editing technique.

Data Requirements

Requires access to pre-trained text-to-image models.

Scalability

Scalable as it's a post-processing/editing technique applied to existing generative models.

Production Readiness

Maturity Level

Research

Time to Market

Short

View Full Paper Back to Papers