arxiv_ai 95% Match Research Paper AI safety researchers,Policymakers,National security agencies,LLM developers,Ethicists 1 week ago

Quantifying CBRN Risk in Frontier Models

ai-safety › robustness

📄 Abstract

Abstract: Frontier Large Language Models (LLMs) pose unprecedented dual-use risks through the potential proliferation of chemical, biological, radiological, and nuclear (CBRN) weapons knowledge. We present the first comprehensive evaluation of 10 leading commercial LLMs against both a novel 200-prompt CBRN dataset and a 180-prompt subset of the FORTRESS benchmark, using a rigorous three-tier attack methodology. Our findings expose critical safety vulnerabilities: Deep Inception attacks achieve 86.0\% success versus 33.8\% for direct requests, demonstrating superficial filtering mechanisms; Model safety performance varies dramatically from 2\% (claude-opus-4) to 96\% (mistral-small-latest) attack success rates; and eight models exceed 70\% vulnerability when asked to enhance dangerous material properties. We identify fundamental brittleness in current safety alignment, where simple prompt engineering techniques bypass safeguards for dangerous CBRN information. These results challenge industry safety claims and highlight urgent needs for standardized evaluation frameworks, transparent safety metrics, and more robust alignment techniques to mitigate catastrophic misuse risks while preserving beneficial capabilities.

Authors (5)

Divyanshu Kumar

Nitin Aravind Birur

Tanay Baswa

Sahil Agarwal

Prashanth Harshangi

Submitted

October 24, 2025

arXiv Category

cs.CR

arXiv PDF

Key Contributions

This paper presents the first comprehensive evaluation of 10 leading commercial LLMs against CBRN proliferation knowledge, using a novel dataset and rigorous attack methodology. It reveals critical safety vulnerabilities, demonstrating that simple prompt engineering can bypass safeguards, with significant variation in model safety performance and a high success rate for 'Deep Inception' attacks.

Business Value

Crucial for understanding and mitigating the risks associated with powerful AI models, informing policy, regulation, and responsible AI development to prevent misuse for dangerous purposes.

Paper Metadata

Innovation Type

Evaluation Framework and Dataset

Deployment Feasibility

High for the evaluation methodology; practical application involves ongoing monitoring and updating of safety measures.

Limitations Addressed

The potential for frontier LLMs to proliferate dangerous knowledge (CBRN weapons), the superficiality of current safety alignment mechanisms, and the lack of standardized methods for evaluating such risks.

Performance Gains

N/A (focus is on identifying failures/vulnerabilities)

Technical Tags

Large Language ModelsCBRN riskDual-use AISafety alignmentPrompt engineeringVulnerability assessmentAttack methodologyBenchmarking

Research Topics

AI SafetyAI EthicsSecurityRisk AssessmentNatural Language Processing

Methods & Architectures

Comprehensive EvaluationAttack MethodologyPrompt EngineeringDataset Creation Large Language Models (LLMs)

Applications & Tasks

AI Safety National Security Chemical, Biological, Radiological, and Nuclear (CBRN) Defense Risk QuantificationSafety AlignmentSecurity VulnerabilityMisinformation Control Quantifying CBRN risk in LLMsEvaluating LLM safety alignmentIdentifying vulnerabilities to prompt engineering attacks

Datasets & Benchmarks

Datasets

FORTRESS benchmark (subset)

Benchmarks

Deep Inception attacks: 86.0% success • Direct requests: 33.8% success • Model safety performance (attack success rates): 2% (claude-opus-4) to 96% (mistral-small-latest) • Eight models exceed 70% vulnerability when asked to enhance dangerous material properties.

Attack success rateModel safety performanceVulnerability percentage

Related Fields

AI SafetyAI EthicsCybersecurityNational SecurityNatural Language ProcessingRisk Management

Keywords

Large Language ModelsLLMsCBRNDual-useAI SafetyAI EthicsSecurityVulnerabilityPrompt EngineeringAlignmentRisk AssessmentEvaluationFORTRESS

Academic Context

#AI Safety#AI Ethics#Security#Risk Assessment#Natural Language Processing

Companies & Organizations

Companies Mentioned

OpenAI Anthropic Google Meta Mistral AI

Commercial Potential

Potential Products

AI safety auditing toolsRisk assessment platforms for AISecure AI development guidelines

Target Industries

Technology (AI Development)Government (Defense, Security)Research Institutions

Use Case Examples

Auditing LLMs for potential misuse in generating dangerous informationDeveloping more robust safety filters for AI modelsInforming regulatory frameworks for AI development

Competitive Edge

Establishes a benchmark and methodology for evaluating a critical aspect of AI safety that is currently under-addressed by existing alignment techniques.

Market Opportunity

Significant market interest in AI safety and security solutions, driven by regulatory pressure and public concern.

Revenue Models

Consulting services for AI safety auditslicensing of evaluation toolsdevelopment of secure AI platforms.

Resource Requirements

Compute Needs

Moderate to high, for running multiple LLMs and performing extensive prompt-based evaluations.

Data Requirements

Requires curated datasets related to CBRN knowledge and adversarial prompts.

Deployment Constraints

The dynamic nature of LLM development requires continuous re-evaluation.,Ethical considerations in generating and testing dangerous prompts.

Scalability

The evaluation methodology can be scaled to include more LLMs and a wider range of dangerous knowledge domains.

Regulatory Considerations

Potential for misuse of findingsNeed for responsible disclosureInternational regulations on AI safety and WMD proliferation

Production Readiness

Maturity Level

Research / Early Assessment

Time to Market

Ongoing; continuous assessment is needed.

Patent Potential

Low for the findings, moderate for novel evaluation techniques or defense mechanisms.

View Full Paper Back to Papers