Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 96% Match Research Paper AI Researchers,Cybersecurity Analysts,Digital Forensics Experts,Media Companies,Platform Trust & Safety Teams 2 weeks ago

PIA: Deepfake Detection Using Phoneme-Temporal and Identity-Dynamic Analysis

computer-vision › video-understanding
📄 Abstract

Abstract: The rise of manipulated media has made deepfakes a particularly insidious threat, involving various generative manipulations such as lip-sync modifications, face-swaps, and avatar-driven facial synthesis. Conventional detection methods, which predominantly depend on manually designed phoneme-viseme alignment thresholds, fundamental frame-level consistency checks, or a unimodal detection strategy, inadequately identify modern-day deepfakes generated by advanced generative models such as GANs, diffusion models, and neural rendering techniques. These advanced techniques generate nearly perfect individual frames yet inadvertently create minor temporal discrepancies frequently overlooked by traditional detectors. We present a novel multimodal audio-visual framework, Phoneme-Temporal and Identity-Dynamic Analysis(PIA), incorporating language, dynamic face motion, and facial identification cues to address these limitations. We utilize phoneme sequences, lip geometry data, and advanced facial identity embeddings. This integrated method significantly improves the detection of subtle deepfake alterations by identifying inconsistencies across multiple complementary modalities. Code is available at https://github.com/skrantidatta/PIA
Authors (4)
Soumyya Kanti Datta
Tanvi Ranga
Chengzhe Sun
Siwei Lyu
Submitted
October 16, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Presents PIA, a novel multimodal audio-visual framework for deepfake detection that analyzes phoneme-temporal and identity-dynamic cues. It overcomes limitations of conventional methods by incorporating language, dynamic face motion, and facial identification, effectively detecting advanced deepfakes that exhibit subtle temporal inconsistencies.

Business Value

Enhances trust in digital media by providing robust tools to identify manipulated videos, critical for news organizations, social platforms, and legal investigations.