arxiv_ml 95% Match Research Paper AI Security Researchers,Speech Technology Developers,NLP Engineers,Product Managers for voice-based systems 3 weeks ago

SPIRIT: Patching Speech Language Models against Jailbreak Attacks

ai-safety › robustness

📄 Abstract

Abstract: Speech Language Models (SLMs) enable natural interactions via spoken instructions, which more effectively capture user intent by detecting nuances in speech. The richer speech signal introduces new security risks compared to text-based models, as adversaries can better bypass safety mechanisms by injecting imperceptible noise to speech. We analyze adversarial attacks and find that SLMs are substantially more vulnerable to jailbreak attacks, which can achieve a perfect 100% attack success rate in some instances. To improve security, we propose post-hoc patching defenses used to intervene during inference by modifying the SLM's activations that improve robustness up to 99% with (i) negligible impact on utility and (ii) without any re-training. We conduct ablation studies to maximize the efficacy of our defenses and improve the utility/security trade-off, validated with large-scale benchmarks unique to SLMs.

Authors (5)

Amirbek Djanibekov

Nurdaulet Mukhituly

Kentaro Inui

Hanan Aldarmaki

Nils Lukas

Submitted

May 18, 2025

arXiv Category

eess.AS

arXiv PDF

Key Contributions

This paper introduces SPIRIT, a post-hoc patching defense for Speech Language Models (SLMs) against jailbreak attacks. It intervenes during inference by modifying activations, achieving up to 99% robustness with negligible utility impact and without retraining, addressing the vulnerability of SLMs to adversarial speech inputs.

Business Value

Enhances the security and trustworthiness of voice-enabled AI systems, crucial for widespread adoption in sensitive applications like customer service, personal assistants, and secure communication.

Paper Metadata

Innovation Type

Defensive Mechanism

Deployment Feasibility

High, as it's a post-hoc, inference-time defense requiring no retraining.

Limitations Addressed

The vulnerability of Speech Language Models (SLMs) to jailbreak attacks, which can achieve perfect attack success rates, and the security risks introduced by the richer speech signal compared to text.

Performance Gains

Up to 99% robustness

Technical Tags

Speech Language Models (SLMs)adversarial attacksjailbreak attacksrobustnesspost-hoc defensesinference-time interventionspeech signal processingsecurityutility-security trade-offablation studies

Research Topics

AI SecurityRobustness of AI ModelsSpeech ProcessingAdversarial Machine LearningNatural Language Processing

Methods & Architectures

Post-hoc patching defensesInference-time interventionActivation modification Speech Language Models (SLMs)

Applications & Tasks

Voice assistants Speech interfaces Human-computer interaction Security vulnerabilitiesAdversarial attacksModel robustness Speech recognitionSpoken language understandingVoice command processing

Datasets & Benchmarks

Benchmarks

Large-scale benchmarks unique to SLMs

Attack success rateRobustness percentageUtility metrics

Related Fields

AI SecuritySpeech ProcessingNatural Language ProcessingMachine LearningAdversarial Machine Learning

Keywords

SLMspeech securityadversarial attacksjailbreakrobustnessdefenseinference-timevoice assistantsspeech recognitionAI safety

Academic Context

#AI Security#Robustness of AI Models#Speech Processing#Adversarial Machine Learning#Natural Language Processing

Commercial Potential

Potential Products

Security modules for voice assistantsRobustness enhancement tools for speech AI platforms

Target Industries

TechnologyCustomer ServiceAutomotive (in-car voice systems)Healthcare (voice-based patient interaction)

Use Case Examples

Securing voice-controlled smart home devices against malicious commands.Preventing unauthorized access to sensitive information via voice interfaces.Ensuring the reliability of voice-based customer support systems.

Competitive Edge

Provides an effective, retraining-free defense against sophisticated jailbreak attacks on SLMs, improving security without sacrificing performance.

Market Opportunity

Significant, given the growing market for voice-enabled technologies.

Revenue Models

Licensing of defense technologyintegration services.

Resource Requirements

Compute Needs

Minimal additional compute required during inference.

Data Requirements

Speech datasets for training and evaluation, including adversarial examples.

Scalability

Designed to be integrated during inference, suggesting good scalability.

Production Readiness

Maturity Level

Research Prototype

Time to Market

Short to Medium-term, as it's an inference-time patch.

View Full Paper Back to Papers