arxiv_ml 85% Match Research Paper AI security researchers,LLM developers,NLP engineers,Security professionals 1 week ago

Secure Retrieval-Augmented Generation against Poisoning Attacks

large-language-models › alignment

📄 Abstract

Abstract: Large language models (LLMs) have transformed natural language processing (NLP), enabling applications from content generation to decision support. Retrieval-Augmented Generation (RAG) improves LLMs by incorporating external knowledge but also introduces security risks, particularly from data poisoning, where the attacker injects poisoned texts into the knowledge database to manipulate system outputs. While various defenses have been proposed, they often struggle against advanced attacks. To address this, we introduce RAGuard, a detection framework designed to identify poisoned texts. RAGuard first expands the retrieval scope to increase the proportion of clean texts, reducing the likelihood of retrieving poisoned content. It then applies chunk-wise perplexity filtering to detect abnormal variations and text similarity filtering to flag highly similar texts. This non-parametric approach enhances RAG security, and experiments on large-scale datasets demonstrate its effectiveness in detecting and mitigating poisoning attacks, including strong adaptive attacks.

Authors (7)

Zirui Cheng

Jikai Sun

Anjun Gao

Yueyang Quan

Zhuqing Liu

Xiaohua Hu

+1 more

Submitted

October 28, 2025

arXiv Category

cs.CR

arXiv PDF

Key Contributions

Introduces RAGuard, a detection framework to identify poisoned texts in Retrieval-Augmented Generation (RAG) systems. It uses expanded retrieval scope, chunk-wise perplexity filtering, and text similarity filtering to enhance RAG security against advanced poisoning attacks.

Business Value

Protects businesses relying on LLMs for critical applications (e.g., customer support, content creation) from malicious manipulation of their knowledge bases, ensuring reliable and trustworthy outputs.

Paper Metadata

Innovation Type

Detection Framework

Deployment Feasibility

High, as it's a detection framework that can be integrated into existing RAG pipelines.

Limitations Addressed

Existing defenses against data poisoning in RAG systems often struggle against advanced attacks.

Performance Gains

Demonstrates effectiveness in detecting poisoned texts on large-scale datasets, enhancing RAG security.

Technical Tags

Retrieval-Augmented Generation (RAG)LLM SecurityData Poisoning AttacksAdversarial AttacksKnowledge Base SecurityText GenerationNLPDetection FrameworkPerplexity FilteringText SimilarityRobustness

Research Topics

Securing Retrieval-Augmented GenerationDefending LLMs against Data PoisoningDetecting Malicious Content in Knowledge BasesImproving RAG RobustnessAdversarial Attacks on LLM Inputs

Methods & Architectures

RAGuard (Detection Framework)Expanded Retrieval ScopeChunk-wise Perplexity FilteringText Similarity FilteringNon-parametric Approach Retrieval-Augmented Generation (RAG)

Applications & Tasks

Natural Language Processing AI Security Information Retrieval Content Generation Data PoisoningAdversarial AttacksSecurity VulnerabilitiesOutput ManipulationDetection of Malicious Data Detecting poisoned texts in RAG knowledge basesSecuring LLM outputs against poisoning attacks

Datasets & Benchmarks

Datasets

Large-scale datasets

Effectiveness in detecting poisoned textsRobustness against attacks

Related Fields

CybersecurityNatural Language ProcessingMachine Learning SecurityInformation SecurityArtificial Intelligence

Keywords

RAGLLM SecurityData PoisoningAdversarial AttacksKnowledge BaseDetectionPerplexity FilteringText SimilarityNLPRobustnessRAGuardInformation Retrieval

Academic Context

#Securing Retrieval-Augmented Generation#Defending LLMs against Data Poisoning#Detecting Malicious Content in Knowledge Bases#Improving RAG Robustness#Adversarial Attacks on LLM Inputs

Commercial Potential

Potential Products

Security modules for RAG systemsTools for validating external knowledge sources for LLMs

Target Industries

TechnologyFinanceHealthcareMedia

Use Case Examples

Preventing an LLM from generating false information due to poisoned search resultsEnsuring a chatbot's responses remain accurate and unbiased despite potential manipulation of its knowledge base

Competitive Edge

Offers a novel, non-parametric detection framework (RAGuard) specifically for RAG systems, addressing advanced poisoning attacks that current defenses may miss.

Resource Requirements

Compute Needs

Moderate (for running detection algorithms)

Data Requirements

Large-scale text datasets for training and evaluation, including poisoned examples.

Deployment Constraints

Potential for false positives/negatives in detection; computational overhead.

Scalability

The scalability of RAGuard would depend on the efficiency of its filtering mechanisms and the size of the retrieval scope.

Production Readiness

Maturity Level

Research/Framework

Time to Market

Medium (for integration into RAG platforms)

View Full Paper Back to Papers