Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Frontier Large Language Models (LLMs) pose unprecedented dual-use risks
through the potential proliferation of chemical, biological, radiological, and
nuclear (CBRN) weapons knowledge. We present the first comprehensive evaluation
of 10 leading commercial LLMs against both a novel 200-prompt CBRN dataset and
a 180-prompt subset of the FORTRESS benchmark, using a rigorous three-tier
attack methodology. Our findings expose critical safety vulnerabilities: Deep
Inception attacks achieve 86.0\% success versus 33.8\% for direct requests,
demonstrating superficial filtering mechanisms; Model safety performance varies
dramatically from 2\% (claude-opus-4) to 96\% (mistral-small-latest) attack
success rates; and eight models exceed 70\% vulnerability when asked to enhance
dangerous material properties. We identify fundamental brittleness in current
safety alignment, where simple prompt engineering techniques bypass safeguards
for dangerous CBRN information. These results challenge industry safety claims
and highlight urgent needs for standardized evaluation frameworks, transparent
safety metrics, and more robust alignment techniques to mitigate catastrophic
misuse risks while preserving beneficial capabilities.
Authors (5)
Divyanshu Kumar
Nitin Aravind Birur
Tanay Baswa
Sahil Agarwal
Prashanth Harshangi
Submitted
October 24, 2025
Key Contributions
This paper presents the first comprehensive evaluation of 10 leading commercial LLMs against CBRN proliferation knowledge, using a novel dataset and rigorous attack methodology. It reveals critical safety vulnerabilities, demonstrating that simple prompt engineering can bypass safeguards, with significant variation in model safety performance and a high success rate for 'Deep Inception' attacks.
Business Value
Crucial for understanding and mitigating the risks associated with powerful AI models, informing policy, regulation, and responsible AI development to prevent misuse for dangerous purposes.