Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper AI Safety Researchers,AI Security Experts,LLM Developers,AI Ethicists,Red Teamers 1 week ago

SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning

ai-safety › robustness
📄 Abstract

Abstract: The ability of LLM agents to plan and invoke tools exposes them to new safety risks, making a comprehensive red-teaming system crucial for discovering vulnerabilities and ensuring their safe deployment. We present SIRAJ: a generic red-teaming framework for arbitrary black-box LLM agents. We employ a dynamic two-step process that starts with an agent definition and generates diverse seed test cases that cover various risk outcomes, tool-use trajectories, and risk sources. Then, it iteratively constructs and refines model-based adversarial attacks based on the execution trajectories of former attempts. To optimize the red-teaming cost, we present a model distillation approach that leverages structured forms of a teacher model's reasoning to train smaller models that are equally effective. Across diverse evaluation agent settings, our seed test case generation approach yields 2 -- 2.5x boost to the coverage of risk outcomes and tool-calling trajectories. Our distilled 8B red-teamer model improves attack success rate by 100%, surpassing the 671B Deepseek-R1 model. Our ablations and analyses validate the effectiveness of the iterative framework, structured reasoning, and the generalization of our red-teamer models.
Authors (4)
Kaiwen Zhou
Ahmed Elgohary
A S M Iftekhar
Amin Saied
Submitted
October 30, 2025
arXiv Category
cs.CR
arXiv PDF

Key Contributions

Introduces SIRAJ, a generic red-teaming framework for LLM agents that plans and invokes tools. It employs a dynamic process to generate diverse test cases covering various risks and iteratively refines attacks. A key innovation is model distillation using structured reasoning to train smaller, effective red-teaming models, optimizing cost.

Business Value

Significantly enhances the safety and reliability of AI agents by proactively identifying and mitigating potential vulnerabilities before deployment, reducing risks of misuse or unintended consequences.