Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Systematization of Knowledge (SoK) AI Security Researchers,LLM Developers,AI Safety Engineers,Cybersecurity Professionals 2 weeks ago

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

ai-safety › alignment
📄 Abstract

Abstract: Large Language Models (LLMs) have rapidly become integral to real-world applications, powering services across diverse sectors. However, their widespread deployment has exposed critical security risks, particularly through jailbreak prompts that can bypass model alignment and induce harmful outputs. Despite intense research into both attack and defense techniques, the field remains fragmented: definitions, threat models, and evaluation criteria vary widely, impeding systematic progress and fair comparison. In this Systematization of Knowledge (SoK), we address these challenges by (1) proposing a holistic, multi-level taxonomy that organizes attacks, defenses, and vulnerabilities in LLM prompt security; (2) formalizing threat models and cost assumptions into machine-readable profiles for reproducible evaluation; (3) introducing an open-source evaluation toolkit for standardized, auditable comparison of attacks and defenses; (4) releasing JAILBREAKDB, the largest annotated dataset of jailbreak and benign prompts to date;\footnote{The dataset is released at \href{https://huggingface.co/datasets/youbin2014/JailbreakDB}{\textcolor{purple}{https://huggingface.co/datasets/youbin2014/JailbreakDB}}.} and (5) presenting a comprehensive evaluation platform and leaderboard of state-of-the-art methods \footnote{will be released soon.}. Our work unifies fragmented research, provides rigorous foundations for future studies, and supports the development of robust, trustworthy LLMs suitable for high-stakes deployment.
Authors (9)
Hanbin Hong
Shuya Feng
Nima Naderloui
Shenao Yan
Jingyu Zhang
Biying Liu
+3 more
Submitted
October 17, 2025
arXiv Category
cs.CR
arXiv PDF Code

Key Contributions

This Systematization of Knowledge (SoK) paper addresses the fragmented field of LLM prompt security by proposing a multi-level taxonomy of attacks, defenses, and vulnerabilities. It formalizes threat models for reproducible evaluation, introduces an open-source evaluation toolkit for standardized comparison, and releases JAILBREAKDB, a large annotated dataset of jailbreak prompts.

Business Value

Enhances the security and trustworthiness of LLM deployments across various industries, reducing risks associated with malicious use, harmful content generation, and security breaches, thereby fostering wider adoption of LLM-powered applications.

View Code on GitHub