Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Deepfakes have recently raised significant trust issues and security concerns
among the public. Compared to CNN face forgery detectors, ViT-based methods
take advantage of the expressivity of transformers, achieving superior
detection performance. However, these approaches still exhibit the following
limitations: (1) Fully fine-tuning ViT-based models from ImageNet weights
demands substantial computational and storage resources; (2) ViT-based methods
struggle to capture local forgery clues, leading to model bias; (3) These
methods limit their scope on only one or few face forgery features, resulting
in limited generalizability. To tackle these challenges, this work introduces
Mixture-of-Experts modules for Face Forgery Detection (MoE-FFD), a generalized
yet parameter-efficient ViT-based approach. MoE-FFD only updates lightweight
Low-Rank Adaptation (LoRA) and Adapter layers while keeping the ViT backbone
frozen, thereby achieving parameter-efficient training. Moreover, MoE-FFD
leverages the expressivity of transformers and local priors of CNNs to
simultaneously extract global and local forgery clues. Additionally, novel MoE
modules are designed to scale the model's capacity and smartly select optimal
forgery experts, further enhancing forgery detection performance. Our proposed
learning scheme can be seamlessly adapted to various transformer backbones in a
plug-and-play manner. Extensive experimental results demonstrate that the
proposed method achieves state-of-the-art face forgery detection performance
with significantly reduced parameter overhead. The code is released at:
https://github.com/LoveSiameseCat/MoE-FFD.