arxiv_ai 95% Match research paper AI safety researchers,MLLM developers,security experts,AI ethicists 2 weeks ago

Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling

large-language-models › multimodal-llms

📄 Abstract

Abstract: Multimodal large language models (MLLMs) exhibit remarkable capabilities but remain susceptible to jailbreak attacks exploiting cross-modal vulnerabilities. In this work, we introduce a novel method that leverages sequential comic-style visual narratives to circumvent safety alignments in state-of-the-art MLLMs. Our method decomposes malicious queries into visually innocuous storytelling elements using an auxiliary LLM, generates corresponding image sequences through diffusion models, and exploits the models' reliance on narrative coherence to elicit harmful outputs. Extensive experiments on harmful textual queries from established safety benchmarks show that our approach achieves an average attack success rate of 83.5\%, surpassing prior state-of-the-art by 46\%. Compared with existing visual jailbreak methods, our sequential narrative strategy demonstrates superior effectiveness across diverse categories of harmful content. We further analyze attack patterns, uncover key vulnerability factors in multimodal safety mechanisms, and evaluate the limitations of current defense strategies against narrative-driven attacks, revealing significant gaps in existing protections.

Authors (9)

Deyue Zhang

Dongdong Yang

Junjie Mu

Quancheng Zou

Zonghao Ying

Wenzhuo Xu

+3 more

Submitted

October 16, 2025

arXiv Category

cs.CR

arXiv PDF

Key Contributions

Introduces a novel method using sequential comic-style visual narratives to jailbreak multimodal LLMs. It decomposes malicious queries, generates image sequences via diffusion models, and exploits narrative coherence to elicit harmful outputs, achieving a high attack success rate.

Business Value

Highlights critical security vulnerabilities in multimodal AI systems, driving the development of more robust safety mechanisms and responsible AI practices.

Paper Metadata

Innovation Type

novel attack vector

Deployment Feasibility

High for demonstrating the attack, moderate for implementing defenses against it.

Limitations Addressed

Addresses the vulnerability of MLLMs to jailbreak attacks that leverage cross-modal interactions, specifically by exploiting their reliance on narrative coherence.

Performance Gains

Achieves significantly higher attack success rates compared to prior state-of-the-art methods for jailbreaking MLLMs.

Technical Tags

multimodal large language models (mllms)jailbreakingvisual storytellingsequential comicsdiffusion modelsauxiliary llmnarrative coherenceharmful content generationattack success ratecross-modal vulnerabilities

Research Topics

AI safetymultimodal AI securityLLM alignmentadversarial attacksgenerative models

Methods & Architectures

Sequential comic generationDecomposition of malicious queriesImage sequence generation using diffusion modelsExploiting narrative coherence Multimodal Large Language Models (MLLMs)Diffusion ModelsAuxiliary LLM

Applications & Tasks

AI safety multimedia generation LLM security jailbreaking MLLMsbypassing safety alignmentsgenerating harmful content via multimodal inputs circumventing safety measures in MLLMseliciting harmful outputs through visual narratives

Datasets & Benchmarks

Benchmarks

Average attack success rate of 83.5% • Surpassing prior state-of-the-art by 46%

Attack success rateEffectiveness across harmful content categories

Related Fields

AI safetymultimodal AIcomputer visionnatural language processinggenerative modelsadversarial machine learning

Keywords

multimodal LLMjailbreakAI safetyvisual storytellingcomicsdiffusion modelsadversarial attackharmful contentcross-modalnarrative coherenceMLLM security

Academic Context

#AI safety#multimodal AI security#LLM alignment#adversarial attacks#generative models

Commercial Potential

Target Industries

TechnologyAI Development

Use Case Examples

Demonstrating how MLLMs can be tricked into generating inappropriate content through a sequence of seemingly harmless images and text.Testing the robustness of safety filters in multimodal AI systems.

Competitive Edge

Introduces a novel and highly effective attack strategy that exploits the multimodal nature and narrative understanding of MLLMs, surpassing previous jailbreaking techniques.

Market Opportunity

N/A (attack research)

Revenue Models

N/A (attack research)

Resource Requirements

Compute Needs

Moderate to high, for generating image sequences with diffusion models and running MLLMs.

Data Requirements

Access to state-of-the-art MLLMs and diffusion models; potentially requires curated datasets for attack generation.

Deployment Constraints

Requires careful orchestration of multiple generative models and understanding of MLLM interaction dynamics.

Scalability

The attack methodology can be adapted to different MLLMs and diffusion models.

Production Readiness

Maturity Level

Research/Demonstration

Time to Market

N/A (attack research)

Patent Potential

Low, as it describes an attack method.

View Full Paper Back to Papers