Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: This paper presents a systematic study of scaling laws for the deepfake
detection task. Specifically, we analyze the model performance against the
number of real image domains, deepfake generation methods, and training images.
Since no existing dataset meets the scale requirements for this research, we
construct ScaleDF, the largest dataset to date in this field, which contains
over 5.8 million real images from 51 different datasets (domains) and more than
8.8 million fake images generated by 102 deepfake methods. Using ScaleDF, we
observe power-law scaling similar to that shown in large language models
(LLMs). Specifically, the average detection error follows a predictable
power-law decay as either the number of real domains or the number of deepfake
methods increases. This key observation not only allows us to forecast the
number of additional real domains or deepfake methods required to reach a
target performance, but also inspires us to counter the evolving deepfake
technology in a data-centric manner. Beyond this, we examine the role of
pre-training and data augmentations in deepfake detection under scaling, as
well as the limitations of scaling itself.
Authors (5)
Wenhao Wang
Longqi Cai
Taihong Xiao
Yuxiao Wang
Ming-Hsuan Yang
Submitted
October 18, 2025
Key Contributions
Presents a systematic study of scaling laws for deepfake detection, revealing power-law relationships between performance and the number of real image domains or deepfake generation methods. It introduces ScaleDF, the largest dataset to date (5.8M real, 8.8M fake images), enabling prediction of detection performance and informing strategies against evolving deepfakes.
Business Value
Provides crucial insights for building more robust and future-proof deepfake detection systems, essential for combating misinformation and ensuring digital trust.