Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: While AI-generated text (AIGT) detectors achieve over 90\% accuracy on direct
LLM outputs, they fail catastrophically against iteratively-paraphrased
content. We investigate why iteratively-paraphrased text -- itself AI-generated
-- evades detection systems designed for AIGT identification. Through intrinsic
mechanism analysis, we reveal that iterative paraphrasing creates an
intermediate laundering region characterized by semantic displacement with
preserved generation patterns, which brings up two attack categories:
paraphrasing human-authored text (authorship obfuscation) and paraphrasing
LLM-generated text (plagiarism evasion). To address these vulnerabilities, we
introduce PADBen, the first benchmark systematically evaluating detector
robustness against both paraphrase attack scenarios. PADBen comprises a
five-type text taxonomy capturing the full trajectory from original content to
deeply laundered text, and five progressive detection tasks across
sentence-pair and single-sentence challenges. We evaluate 11 state-of-the-art
detectors, revealing critical asymmetry: detectors successfully identify the
plagiarism evasion problem but fail for the case of authorship obfuscation. Our
findings demonstrate that current detection approaches cannot effectively
handle the intermediate laundering region, necessitating fundamental advances
in detection architectures beyond existing semantic and stylistic
discrimination methods. For detailed code implementation, please see
https://github.com/JonathanZha47/PadBen-Paraphrase-Attack-Benchmark.
Key Contributions
Introduces PADBen, the first benchmark to systematically evaluate AI text detector robustness against paraphrase attacks. It addresses the catastrophic failure of detectors against iteratively-paraphrased content by analyzing the mechanism (semantic displacement with preserved generation patterns) and proposing a five-type text taxonomy and five detection tasks.
Business Value
Crucial for maintaining academic integrity, verifying content authenticity, and preventing misuse of AI-generated text in sensitive applications, thereby protecting educational institutions and businesses from fraud and plagiarism.