Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper ML Security Researchers,AI Developers,Cybersecurity Professionals,Data Scientists 1 month ago

A Few Large Shifts: Layer-Inconsistency Based Minimal Overhead Adversarial Example Detection

ai-safety › robustness
📄 Abstract

Abstract: Deep neural networks (DNNs) are highly susceptible to adversarial examples--subtle, imperceptible perturbations that can lead to incorrect predictions. While detection-based defenses offer a practical alternative to adversarial training, many existing methods depend on external models, complex architectures, or adversarial data, limiting their efficiency and generalizability. We introduce a lightweight, plug-in detection framework that leverages internal layer-wise inconsistencies within the target model itself, requiring only benign data for calibration. Our approach is grounded in the A Few Large Shifts Assumption, which posits that adversarial perturbations induce large, localized violations of layer-wise Lipschitz continuity in a small subset of layers. Building on this, we propose two complementary strategies--Recovery Testing (RT) and Logit-layer Testing (LT)--to empirically measure these violations and expose internal disruptions caused by adversaries. Evaluated on CIFAR-10, CIFAR-100, and ImageNet under both standard and adaptive threat models, our method achieves state-of-the-art detection performance with negligible computational overhead. Furthermore, our system-level analysis provides a practical method for selecting a detection threshold with a formal lower-bound guarantee on accuracy. The code is available here: https://github.com/c0510gy/AFLS-AED.

Key Contributions

Introduces a lightweight, plug-in adversarial example detection framework that uses internal layer-wise inconsistencies within the target model, grounded in the 'A Few Large Shifts' assumption. It proposes Recovery Testing (RT) and Logit-layer Testing (LT) for empirical measurement.

Business Value

Enhances the security and trustworthiness of AI systems by providing a practical and efficient way to detect adversarial attacks, crucial for sensitive applications.