Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Multimodal large language models (MLLMs) exhibit remarkable capabilities but
remain susceptible to jailbreak attacks exploiting cross-modal vulnerabilities.
In this work, we introduce a novel method that leverages sequential comic-style
visual narratives to circumvent safety alignments in state-of-the-art MLLMs.
Our method decomposes malicious queries into visually innocuous storytelling
elements using an auxiliary LLM, generates corresponding image sequences
through diffusion models, and exploits the models' reliance on narrative
coherence to elicit harmful outputs. Extensive experiments on harmful textual
queries from established safety benchmarks show that our approach achieves an
average attack success rate of 83.5\%, surpassing prior state-of-the-art by
46\%. Compared with existing visual jailbreak methods, our sequential narrative
strategy demonstrates superior effectiveness across diverse categories of
harmful content. We further analyze attack patterns, uncover key vulnerability
factors in multimodal safety mechanisms, and evaluate the limitations of
current defense strategies against narrative-driven attacks, revealing
significant gaps in existing protections.
Authors (9)
Deyue Zhang
Dongdong Yang
Junjie Mu
Quancheng Zou
Zonghao Ying
Wenzhuo Xu
+3 more
Submitted
October 16, 2025
Key Contributions
Introduces a novel method using sequential comic-style visual narratives to jailbreak multimodal LLMs. It decomposes malicious queries, generates image sequences via diffusion models, and exploits narrative coherence to elicit harmful outputs, achieving a high attack success rate.
Business Value
Highlights critical security vulnerabilities in multimodal AI systems, driving the development of more robust safety mechanisms and responsible AI practices.