arxiv_ai 95% Match Systematization of Knowledge (SoK) AI Security Researchers,LLM Developers,AI Safety Engineers,Cybersecurity Professionals 2 weeks ago

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

ai-safety › alignment

📄 Abstract

Abstract: Large Language Models (LLMs) have rapidly become integral to real-world applications, powering services across diverse sectors. However, their widespread deployment has exposed critical security risks, particularly through jailbreak prompts that can bypass model alignment and induce harmful outputs. Despite intense research into both attack and defense techniques, the field remains fragmented: definitions, threat models, and evaluation criteria vary widely, impeding systematic progress and fair comparison. In this Systematization of Knowledge (SoK), we address these challenges by (1) proposing a holistic, multi-level taxonomy that organizes attacks, defenses, and vulnerabilities in LLM prompt security; (2) formalizing threat models and cost assumptions into machine-readable profiles for reproducible evaluation; (3) introducing an open-source evaluation toolkit for standardized, auditable comparison of attacks and defenses; (4) releasing JAILBREAKDB, the largest annotated dataset of jailbreak and benign prompts to date;\footnote{The dataset is released at \href{https://huggingface.co/datasets/youbin2014/JailbreakDB}{\textcolor{purple}{https://huggingface.co/datasets/youbin2014/JailbreakDB}}.} and (5) presenting a comprehensive evaluation platform and leaderboard of state-of-the-art methods \footnote{will be released soon.}. Our work unifies fragmented research, provides rigorous foundations for future studies, and supports the development of robust, trustworthy LLMs suitable for high-stakes deployment.

Authors (9)

Hanbin Hong

Shuya Feng

Nima Naderloui

Shenao Yan

Jingyu Zhang

Biying Liu

+3 more

Submitted

October 17, 2025

arXiv Category

cs.CR

arXiv PDF Code

Key Contributions

This Systematization of Knowledge (SoK) paper addresses the fragmented field of LLM prompt security by proposing a multi-level taxonomy of attacks, defenses, and vulnerabilities. It formalizes threat models for reproducible evaluation, introduces an open-source evaluation toolkit for standardized comparison, and releases JAILBREAKDB, a large annotated dataset of jailbreak prompts.

Business Value

Enhances the security and trustworthiness of LLM deployments across various industries, reducing risks associated with malicious use, harmful content generation, and security breaches, thereby fostering wider adoption of LLM-powered applications.

Paper Metadata

Innovation Type

Methodological/Framework

Deployment Feasibility

High. The toolkit and taxonomy are designed to be used by researchers and developers to improve LLM security practices.

Limitations Addressed

Fragmented field with varying definitions, threat models, and evaluation criteria,Impediment to systematic progress and fair comparison of LLM security techniques,Lack of standardized evaluation tools and datasets

Performance Gains

Enables systematic progress and fair comparison of LLM security techniques through standardization.

View Code on GitHub

Technical Tags

LLM securityjailbreakingprompt engineeringthreat modelsevaluation toolkittaxonomyvulnerabilitiesmodel alignmentharmful outputs

Research Topics

AI SafetyLLM SecurityRobustnessEvaluation MethodologiesCybersecurity

Methods & Architectures

Taxonomy DevelopmentFormalization of Threat ModelsDevelopment of Evaluation ToolkitData Curation (JAILBREAKDB) Large Language Models (LLMs)

Applications & Tasks

AI Security LLM Deployment Cybersecurity Content Moderation Prompt Security VulnerabilitiesBypassing Model AlignmentGenerating Harmful OutputsLack of Standardized Evaluation Organizing LLM prompt security attacks and defensesFormalizing threat models for reproducible evaluationEnabling standardized comparison of attacks and defensesIdentifying and mitigating LLM vulnerabilities

Datasets & Benchmarks

Datasets

JAILBREAKDB

Related Fields

AI SafetyCybersecurityNatural Language ProcessingMachine Learning SecuritySoftware Engineering

Keywords

LLM SecurityPrompt InjectionJailbreakingAI SafetyModel AlignmentVulnerabilitiesThreat ModelingEvaluationCybersecurityLarge Language Models

Academic Context

#AI Safety#LLM Security#Robustness#Evaluation Methodologies#Cybersecurity

Technology Stack

Frameworks & Libraries

Evaluation Toolkit (open-source)

Commercial Potential

Potential Products

LLM security auditing servicesPrompt security testing toolsSecure LLM deployment frameworks

Target Industries

TechnologyFinanceHealthcareGovernmentAny industry deploying LLMs

Use Case Examples

Testing the security of a customer service chatbot against jailbreak attemptsDeveloping defenses against prompt injection attacks in AI applicationsBenchmarking the robustness of different LLMs to adversarial prompts

Competitive Edge

Provides a much-needed standardized framework and dataset for evaluating LLM prompt security, which is currently lacking in the field.

Market Opportunity

Rapidly growing market for AI security and LLM governance solutions.

Revenue Models

Consultingtrainingdevelopment of commercial security tools based on the framework.

Resource Requirements

Compute Needs

Minimal for using the toolkit; significant for training/evaluating LLMs against attacks.

Data Requirements

Access to LLMs for testing, the JAILBREAKDB dataset.

Deployment Constraints

The effectiveness of defenses can be dynamic as new attack methods emerge.

Scalability

The evaluation toolkit is designed for scalability, allowing for systematic testing across many prompts and models.

Regulatory Considerations

Potential for misuse of attack techniques if not handled responsibly.

Production Readiness

Maturity Level

Framework/Tooling

Time to Market

Immediate (for using the toolkit/dataset)

Licensing

Open Source (for toolkit and dataset)

Patent Potential

Low for the taxonomy and toolkit itself, potentially moderate for novel defense mechanisms developed using it.

View Full Paper Back to Papers