Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper AI safety researchers,Policymakers,National security agencies,LLM developers,Ethicists 1 week ago

Quantifying CBRN Risk in Frontier Models

ai-safety › robustness
📄 Abstract

Abstract: Frontier Large Language Models (LLMs) pose unprecedented dual-use risks through the potential proliferation of chemical, biological, radiological, and nuclear (CBRN) weapons knowledge. We present the first comprehensive evaluation of 10 leading commercial LLMs against both a novel 200-prompt CBRN dataset and a 180-prompt subset of the FORTRESS benchmark, using a rigorous three-tier attack methodology. Our findings expose critical safety vulnerabilities: Deep Inception attacks achieve 86.0\% success versus 33.8\% for direct requests, demonstrating superficial filtering mechanisms; Model safety performance varies dramatically from 2\% (claude-opus-4) to 96\% (mistral-small-latest) attack success rates; and eight models exceed 70\% vulnerability when asked to enhance dangerous material properties. We identify fundamental brittleness in current safety alignment, where simple prompt engineering techniques bypass safeguards for dangerous CBRN information. These results challenge industry safety claims and highlight urgent needs for standardized evaluation frameworks, transparent safety metrics, and more robust alignment techniques to mitigate catastrophic misuse risks while preserving beneficial capabilities.
Authors (5)
Divyanshu Kumar
Nitin Aravind Birur
Tanay Baswa
Sahil Agarwal
Prashanth Harshangi
Submitted
October 24, 2025
arXiv Category
cs.CR
arXiv PDF

Key Contributions

This paper presents the first comprehensive evaluation of 10 leading commercial LLMs against CBRN proliferation knowledge, using a novel dataset and rigorous attack methodology. It reveals critical safety vulnerabilities, demonstrating that simple prompt engineering can bypass safeguards, with significant variation in model safety performance and a high success rate for 'Deep Inception' attacks.

Business Value

Crucial for understanding and mitigating the risks associated with powerful AI models, informing policy, regulation, and responsible AI development to prevent misuse for dangerous purposes.