arxiv_cl 90% Match Research Paper AI Researchers,Developers of AI Text Detectors,Educators,Content Platforms,Cybersecurity Professionals 1 day ago

PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks

large-language-models › alignment

📄 Abstract

Abstract: While AI-generated text (AIGT) detectors achieve over 90\% accuracy on direct LLM outputs, they fail catastrophically against iteratively-paraphrased content. We investigate why iteratively-paraphrased text -- itself AI-generated -- evades detection systems designed for AIGT identification. Through intrinsic mechanism analysis, we reveal that iterative paraphrasing creates an intermediate laundering region characterized by semantic displacement with preserved generation patterns, which brings up two attack categories: paraphrasing human-authored text (authorship obfuscation) and paraphrasing LLM-generated text (plagiarism evasion). To address these vulnerabilities, we introduce PADBen, the first benchmark systematically evaluating detector robustness against both paraphrase attack scenarios. PADBen comprises a five-type text taxonomy capturing the full trajectory from original content to deeply laundered text, and five progressive detection tasks across sentence-pair and single-sentence challenges. We evaluate 11 state-of-the-art detectors, revealing critical asymmetry: detectors successfully identify the plagiarism evasion problem but fail for the case of authorship obfuscation. Our findings demonstrate that current detection approaches cannot effectively handle the intermediate laundering region, necessitating fundamental advances in detection architectures beyond existing semantic and stylistic discrimination methods. For detailed code implementation, please see https://github.com/JonathanZha47/PadBen-Paraphrase-Attack-Benchmark.

Key Contributions

Introduces PADBen, the first benchmark to systematically evaluate AI text detector robustness against paraphrase attacks. It addresses the catastrophic failure of detectors against iteratively-paraphrased content by analyzing the mechanism (semantic displacement with preserved generation patterns) and proposing a five-type text taxonomy and five detection tasks.

Business Value

Crucial for maintaining academic integrity, verifying content authenticity, and preventing misuse of AI-generated text in sensitive applications, thereby protecting educational institutions and businesses from fraud and plagiarism.

Paper Metadata

Innovation Type

Benchmark/Dataset

Deployment Feasibility

High, as it provides a benchmark for evaluating existing and future AI text detection systems.

Limitations Addressed

The vulnerability of AI-generated text detectors to iterative paraphrasing, which allows AI-generated text to evade detection, leading to issues in authorship obfuscation and plagiarism evasion.

Performance Gains

Demonstrates significant failure rates of existing detectors against paraphrased content.

Technical Tags

AI Text DetectionLLM-generated Text (AIGT)Paraphrase AttacksRobustnessAuthorship ObfuscationPlagiarism EvasionBenchmarkText TaxonomySemantic DisplacementGeneration Patterns

Research Topics

AI Text Detection RobustnessAdversarial Attacks on AI ModelsAuthorship VerificationAI PlagiarismLLM Evasion Techniques

Methods & Architectures

Benchmark Creation (PADBen)Iterative ParaphrasingMechanism AnalysisText Taxonomy DevelopmentDetection Task Evaluation AI Text DetectorsLarge Language Models (LLMs)

Applications & Tasks

Academic Integrity Content Authenticity AI Security Information Verification AI text detectors failing against paraphrased contentEvading detection systemsAuthorship obfuscationPlagiarism evasion AI Text DetectionEvaluating detector robustnessIdentifying paraphrased AI-generated text

Datasets & Benchmarks

Datasets

PADBen

AccuracyRobustnessDetection performance

Related Fields

Natural Language ProcessingMachine LearningCybersecurityInformation RetrievalAI Ethics

Keywords

AI text detectionLLMparaphrasingrobustnessbenchmarkauthorship obfuscationplagiarismadversarial attacksemantic displacementgeneration patterns

Academic Context

#AI Text Detection Robustness#Adversarial Attacks on AI Models#Authorship Verification#AI Plagiarism#LLM Evasion Techniques

Commercial Potential

Potential Products

More robust AI text detection toolsPlagiarism detection softwareContent authenticity verification services

Target Industries

EducationPublishingTechnologyMediaLegal

Use Case Examples

Universities detecting AI-generated essaysPublishers verifying originality of submitted manuscriptsPlatforms identifying AI-generated fake news

Competitive Edge

Establishes a new standard for evaluating AI text detectors by focusing on their resilience to sophisticated paraphrasing attacks, addressing a critical gap in current evaluation methodologies.

Resource Requirements

Compute Needs

Requires compute for running AI text detectors and generating paraphrased text.

Data Requirements

Requires the PADBen benchmark dataset.

Deployment Constraints

The effectiveness of detectors can vary based on the specific paraphrasing techniques used.

Scalability

The benchmark methodology can be extended with more paraphrasing techniques and text types.

Regulatory Considerations

Regulations around AI-generated contentCopyright and authorship laws

Production Readiness

Maturity Level

Research/Development

Patent Potential

Low, focused on a benchmark and analysis.

View Full Paper Back to Papers