Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: As texts generated by Large Language Models (LLMs) are ever more common and
often indistinguishable from human-written content, research on automatic text
detection has attracted growing attention. Many recent detectors report
near-perfect accuracy, often boasting AUROC scores above 99\%. However, these
claims typically assume fixed generation settings, leaving open the question of
how robust such systems are to changes in decoding strategies. In this work, we
systematically examine how sampling-based decoding impacts detectability, with
a focus on how subtle variations in a model's (sub)word-level distribution
affect detection performance. We find that even minor adjustments to decoding
parameters - such as temperature, top-p, or nucleus sampling - can severely
impair detector accuracy, with AUROC dropping from near-perfect levels to 1\%
in some settings. Our findings expose critical blind spots in current detection
methods and emphasize the need for more comprehensive evaluation protocols. To
facilitate future research, we release a large-scale dataset encompassing 37
decoding configurations, along with our code and evaluation framework
https://github.com/BaggerOfWords/Sampling-and-Detection
Key Contributions
Provides a comprehensive study on how sampling-based decoding strategies (temperature, top-p, nucleus sampling) significantly impact the detectability of LLM-generated text. It reveals that minor parameter adjustments can drastically reduce detector accuracy, exposing critical vulnerabilities in current text detection methods and highlighting the need for more robust evaluation.
Business Value
Crucial for platforms and organizations needing to reliably distinguish between human and AI-generated content, impacting areas like content moderation, plagiarism detection, and combating misinformation. Improves the trustworthiness of AI detection systems.