Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Recent advancements in image synthesis, particularly with the advent of GAN
and Diffusion models, have amplified public concerns regarding the
dissemination of disinformation. To address such concerns, numerous
AI-generated Image (AIGI) Detectors have been proposed and achieved promising
performance in identifying fake images. However, there still lacks a systematic
understanding of the adversarial robustness of AIGI detectors. In this paper,
we examine the vulnerability of state-of-the-art AIGI detectors against
adversarial attack under white-box and black-box settings, which has been
rarely investigated so far. To this end, we propose a new method to attack AIGI
detectors. First, inspired by the obvious difference between real images and
fake images in the frequency domain, we add perturbations under the frequency
domain to push the image away from its original frequency distribution. Second,
we explore the full posterior distribution of the surrogate model to further
narrow this gap between heterogeneous AIGI detectors, e.g. transferring
adversarial examples across CNNs and ViTs. This is achieved by introducing a
novel post-train Bayesian strategy that turns a single surrogate into a
Bayesian one, capable of simulating diverse victim models using one pre-trained
surrogate, without the need for re-training. We name our method as
Frequency-based Post-train Bayesian Attack, or FPBA. Through FPBA, we show that
adversarial attack is truly a real threat to AIGI detectors, because FPBA can
deliver successful black-box attacks across models, generators, defense
methods, and even evade cross-generator detection, which is a crucial
real-world detection scenario. The code will be shared upon acceptance.