arxiv_cv 90% Match Research Paper AI researchers,Cybersecurity professionals,Platform moderators,Policy makers 2 weeks ago

Scaling Laws for Deepfake Detection

ai-safety › robustness

📄 Abstract

Abstract: This paper presents a systematic study of scaling laws for the deepfake detection task. Specifically, we analyze the model performance against the number of real image domains, deepfake generation methods, and training images. Since no existing dataset meets the scale requirements for this research, we construct ScaleDF, the largest dataset to date in this field, which contains over 5.8 million real images from 51 different datasets (domains) and more than 8.8 million fake images generated by 102 deepfake methods. Using ScaleDF, we observe power-law scaling similar to that shown in large language models (LLMs). Specifically, the average detection error follows a predictable power-law decay as either the number of real domains or the number of deepfake methods increases. This key observation not only allows us to forecast the number of additional real domains or deepfake methods required to reach a target performance, but also inspires us to counter the evolving deepfake technology in a data-centric manner. Beyond this, we examine the role of pre-training and data augmentations in deepfake detection under scaling, as well as the limitations of scaling itself.

Authors (5)

Wenhao Wang

Longqi Cai

Taihong Xiao

Yuxiao Wang

Ming-Hsuan Yang

Submitted

October 18, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Presents a systematic study of scaling laws for deepfake detection, revealing power-law relationships between performance and the number of real image domains or deepfake generation methods. It introduces ScaleDF, the largest dataset to date (5.8M real, 8.8M fake images), enabling prediction of detection performance and informing strategies against evolving deepfakes.

Business Value

Provides crucial insights for building more robust and future-proof deepfake detection systems, essential for combating misinformation and ensuring digital trust.

Paper Metadata

Innovation Type

Empirical Study/Dataset Creation

Deployment Feasibility

High, the findings inform the development and scaling of detection systems.

Limitations Addressed

The lack of large-scale datasets and systematic understanding of how detection performance scales with data diversity and generative method complexity, hindering robustness against evolving deepfakes.

Performance Gains

Identifies predictable power-law scaling, allowing forecasting of performance gains with increased data diversity and generative method coverage.

Technical Tags

deepfake detectionscaling lawsdataset scalegenerative methodsdetection errorpower-law scalingrobustnessdomain generalizationlarge-scale dataset

Research Topics

Deepfake DetectionAI RobustnessMachine Learning ScalingAdversarial AIDataset Engineering

Methods & Architectures

Systematic study of scaling lawsDataset construction (ScaleDF)Power-law analysis

Applications & Tasks

Media Forensics Information Security Content Verification Social Media Moderation Evolving deepfake generation methodsNeed for scalable deepfake detectionLack of large-scale datasets for researchPredicting detection performance Deepfake DetectionUnderstanding scaling effects in detectionForecasting detection performance

Datasets & Benchmarks

Datasets

ScaleDF

detection erroraverage detection error

Related Fields

Computer VisionMachine LearningAI SafetyCybersecurityInformation Forensics

Keywords

deepfake detectionscaling lawsdatasetgenerative methodsrobustnesspower lawmachine learningai safetyforensicsmisinformationcomputer vision

Academic Context

#Deepfake Detection#AI Robustness#Machine Learning Scaling#Adversarial AI#Dataset Engineering

Commercial Potential

Potential Products

Deepfake detection servicesTools for content verification platformsDatasets and benchmarks for deepfake research

Target Industries

TechnologyMediaSocial MediaGovernmentFinance

Use Case Examples

Verifying the authenticity of news mediaDetecting malicious deepfakes in political discourseSecuring online identity verification processes

Competitive Edge

Provides a fundamental understanding of scaling principles in deepfake detection, guiding the development of more effective and adaptable defense mechanisms.

Market Opportunity

Large and growing market for cybersecurity and content integrity solutions.

Revenue Models

Licensing of detection technologysubscription services for verification platforms.

Resource Requirements

Compute Needs

High for training detection models on the ScaleDF dataset.

Data Requirements

The ScaleDF dataset (5.8M real, 8.8M fake images).

Deployment Constraints

Computational resources for large-scale detection, continuous updating to counter new deepfake methods.

Scalability

The paper explicitly studies scaling laws, indicating scalability is a key focus.

Regulatory Considerations

Ethical use of deepfake detection technologyPotential for misuse

Production Readiness

Maturity Level

Research/Dataset

Time to Market

Medium, for developing and deploying robust detection systems based on findings.

Patent Potential

Low, focuses on empirical findings and dataset.

View Full Paper Back to Papers