Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper AI Security Researchers,Speech Technology Developers,NLP Engineers,Product Managers for voice-based systems 3 weeks ago

SPIRIT: Patching Speech Language Models against Jailbreak Attacks

ai-safety › robustness
📄 Abstract

Abstract: Speech Language Models (SLMs) enable natural interactions via spoken instructions, which more effectively capture user intent by detecting nuances in speech. The richer speech signal introduces new security risks compared to text-based models, as adversaries can better bypass safety mechanisms by injecting imperceptible noise to speech. We analyze adversarial attacks and find that SLMs are substantially more vulnerable to jailbreak attacks, which can achieve a perfect 100% attack success rate in some instances. To improve security, we propose post-hoc patching defenses used to intervene during inference by modifying the SLM's activations that improve robustness up to 99% with (i) negligible impact on utility and (ii) without any re-training. We conduct ablation studies to maximize the efficacy of our defenses and improve the utility/security trade-off, validated with large-scale benchmarks unique to SLMs.
Authors (5)
Amirbek Djanibekov
Nurdaulet Mukhituly
Kentaro Inui
Hanan Aldarmaki
Nils Lukas
Submitted
May 18, 2025
arXiv Category
eess.AS
arXiv PDF

Key Contributions

This paper introduces SPIRIT, a post-hoc patching defense for Speech Language Models (SLMs) against jailbreak attacks. It intervenes during inference by modifying activations, achieving up to 99% robustness with negligible utility impact and without retraining, addressing the vulnerability of SLMs to adversarial speech inputs.

Business Value

Enhances the security and trustworthiness of voice-enabled AI systems, crucial for widespread adoption in sensitive applications like customer service, personal assistants, and secure communication.