Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Research Paper AI Researchers,ML Engineers,AI Safety Specialists,Developers working with multimodal AI 3 weeks ago

SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs

large-language-models › multimodal-llms
📄 Abstract

Abstract: Large Vision-Language Models (LVLMs) unlock powerful multimodal reasoning but also expand the attack surface, particularly through adversarial inputs that conceal harmful goals in benign prompts. We propose SHIELD, a lightweight, model-agnostic preprocessing framework that couples fine-grained safety classification with category-specific guidance and explicit actions (Block, Reframe, Forward). Unlike binary moderators, SHIELD composes tailored safety prompts that enforce nuanced refusals or safe redirection without retraining. Across five benchmarks and five representative LVLMs, SHIELD consistently lowers jailbreak and non-following rates while preserving utility. Our method is plug-and-play, incurs negligible overhead, and is easily extendable to new attack types -- serving as a practical safety patch for both weakly and strongly aligned LVLMs.
Authors (3)
Juan Ren
Mark Dras
Usman Naseem
Submitted
October 15, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

SHIELD is a lightweight, model-agnostic framework that enhances the safety and robustness of Large Vision-Language Models (LVLMs). It uses classifier-guided prompting with fine-grained safety classification and category-specific guidance to detect and mitigate adversarial inputs. SHIELD provides explicit actions (Block, Reframe, Forward) and composes tailored safety prompts, effectively reducing jailbreak rates while preserving utility, all without retraining the LVLM.

Business Value

Enhances the security and trustworthiness of multimodal AI systems, crucial for applications involving sensitive data or user interaction. This reduces risks associated with malicious use and improves user confidence in AI products.