arxiv_cv 95% Match Research Paper AI Researchers,Creative Technologists,Writers,Game Developers,Filmmakers 1 week ago

SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency

generative-ai › diffusion

📄 Abstract

Abstract: Recent text-to-image models have revolutionized image generation, but they still struggle with maintaining concept consistency across generated images. While existing works focus on character consistency, they often overlook the crucial role of scenes in storytelling, which restricts their creativity in practice. This paper introduces scene-oriented story generation, addressing two key challenges: (i) scene planning, where current methods fail to ensure scene-level narrative coherence by relying solely on text descriptions, and (ii) scene consistency, which remains largely unexplored in terms of maintaining scene consistency across multiple stories. We propose SceneDecorator, a training-free framework that employs VLM-Guided Scene Planning to ensure narrative coherence across different scenes in a ``global-to-local'' manner, and Long-Term Scene-Sharing Attention to maintain long-term scene consistency and subject diversity across generated stories. Extensive experiments demonstrate the superior performance of SceneDecorator, highlighting its potential to unleash creativity in the fields of arts, films, and games.

Authors (8)

Quanjian Song

Donghao Zhou

Jingyu Lin

Fei Shen

Jiaze Wang

Xiaowei Hu

+2 more

Submitted

October 27, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

SceneDecorator introduces scene-oriented story generation, addressing scene planning and scene consistency challenges overlooked by prior character-focused methods. It employs VLM-Guided Scene Planning for narrative coherence and Long-Term Scene-Sharing Attention for consistent scenes and subject diversity across stories, enabling more creative and coherent visual storytelling.

Business Value

Empowers creators to generate more coherent and visually consistent stories, streamlining the process of creating narrative content for various media platforms.

Paper Metadata

Innovation Type

Algorithmic Improvement and New Task Formulation

Deployment Feasibility

High, as it's a training-free framework that builds upon existing text-to-image models.

Limitations Addressed

Struggle with concept consistency across generated images,Overlooking the role of scenes in storytelling,Failure to ensure scene-level narrative coherence,Lack of exploration in maintaining long-term scene consistency

Technical Tags

story generationscene planningscene consistencytext-to-image generationconcept consistencynarrative coherenceVLM-Guided Scene PlanningLong-Term Scene-Sharing Attentionglobal-to-local planningsubject diversity

Research Topics

Generative AINatural Language ProcessingComputer VisionStorytellingMultimodal AI

Methods & Architectures

VLM-Guided Scene PlanningLong-Term Scene-Sharing AttentionTraining-free frameworkGlobal-to-local planningText-to-image models Diffusion ModelsVision-Language Models (VLMs)

Applications & Tasks

Creative Writing Content Generation Storytelling Game Development Film Production Lack of scene consistency in story generationOverlooking scene role in storytellingEnsuring narrative coherence across scenesMaintaining long-term scene consistencyAchieving subject diversity Scene-oriented story generationScene planningScene consistency maintenanceGenerating coherent multi-scene narrativesGenerating diverse visual elements within a story

Related Fields

Generative AIComputer VisionNatural Language ProcessingCreative AIMultimodal AI

Keywords

story generationscene planningscene consistencytext-to-imagediffusion modelsVLMnarrative coherencevisual storytellingcontent creationgenerative AIattention mechanism

Academic Context

#Generative AI#Natural Language Processing#Computer Vision#Storytelling#Multimodal AI

Technology Stack

Frameworks & Libraries

PyTorch (implied)

Commercial Potential

Potential Products

AI-powered storyboarding toolsAutomated narrative content generatorsTools for interactive fiction

Target Industries

Media and EntertainmentGamingPublishingAdvertising

Use Case Examples

Generating a sequence of images that tell a coherent story with consistent settingsCreating visual assets for games with recurring environmental themes

Competitive Edge

Addresses scene-level consistency and planning, which are often overlooked by existing text-to-image models focused primarily on character consistency.

Market Opportunity

Growing demand for AI tools in creative industries.

Revenue Models

SaaSAPI accesslicensing

Resource Requirements

Compute Needs

Moderate to High (inference with large text-to-image models)

Data Requirements

Large-scale image-caption datasets (for underlying T2I models), potentially story datasets.

Deployment Constraints

Relies on the capabilities and limitations of the underlying text-to-image and VLM models.

Scalability

Training-free nature suggests good scalability for generating new stories.

Production Readiness

Maturity Level

Research

Time to Market

1-2 years

Patent Potential

Moderate

View Full Paper Back to Papers