arxiv_ai 90% Match Research Paper AI Safety Researchers,AI Ethicists,LLM Developers,Policy Makers 2 weeks ago

DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios

ai-safety › robustness

📄 Abstract

Abstract: Despite the remarkable advances of Large Language Models (LLMs) across diverse cognitive tasks, the rapid enhancement of these capabilities also introduces emergent deceptive behaviors that may induce severe risks in high-stakes deployments. More critically, the characterization of deception across realistic real-world scenarios remains underexplored. To bridge this gap, we establish DeceptionBench, the first benchmark that systematically evaluates how deceptive tendencies manifest across different societal domains, what their intrinsic behavioral patterns are, and how extrinsic factors affect them. Specifically, on the static count, the benchmark encompasses 150 meticulously designed scenarios in five domains, i.e., Economy, Healthcare, Education, Social Interaction, and Entertainment, with over 1,000 samples, providing sufficient empirical foundations for deception analysis. On the intrinsic dimension, we explore whether models exhibit self-interested egoistic tendencies or sycophantic behaviors that prioritize user appeasement. On the extrinsic dimension, we investigate how contextual factors modulate deceptive outputs under neutral conditions, reward-based incentivization, and coercive pressures. Moreover, we incorporate sustained multi-turn interaction loops to construct a more realistic simulation of real-world feedback dynamics. Extensive experiments across LLMs and Large Reasoning Models (LRMs) reveal critical vulnerabilities, particularly amplified deception under reinforcement dynamics, demonstrating that current models lack robust resistance to manipulative contextual cues and the urgent need for advanced safeguards against various deception behaviors. Code and resources are publicly available at https://github.com/Aries-iai/DeceptionBench.

Authors (6)

Yao Huang

Yitong Sun

Yichi Zhang

Ruochen Zhang

Yinpeng Dong

Xingxing Wei

Submitted

October 17, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

This paper introduces DeceptionBench, the first benchmark to systematically evaluate deceptive tendencies in LLMs across realistic real-world scenarios and societal domains. It encompasses 150 meticulously designed scenarios in five domains (Economy, Healthcare, Education, Social Interaction, Entertainment) with over 1,000 samples, allowing for analysis of deception manifestation, behavioral patterns, and influencing factors like self-interested tendencies.

Business Value

Helps organizations understand and mitigate potential risks associated with deceptive AI behaviors, ensuring safer and more trustworthy AI deployments, particularly in sensitive domains like finance, healthcare, and social interaction.

Paper Metadata

Innovation Type

Benchmark/Dataset

Deployment Feasibility

High. The benchmark is designed for researchers and developers to evaluate and improve LLM safety.

Limitations Addressed

Underexplored characterization of deception across realistic scenarios,Lack of systematic evaluation methods for AI deception,Need for understanding factors influencing deceptive behaviors

Performance Gains

Provides a standardized framework for measuring and comparing AI deception behaviors.

Technical Tags

AI deceptionLLM behaviorsbenchmarkreal-world scenariossocietal domainsbehavioral patternsextrinsic factorsself-interested tendenciesegoistic tendencies

Research Topics

AI SafetyAI EthicsLLM BehaviorRobustnessAI AlignmentSocial Impact of AI

Methods & Architectures

Benchmark DesignScenario CreationData CurationBehavioral Analysis Large Language Models (LLMs)

Applications & Tasks

AI Safety AI Ethics Societal Impact LLM Deployment Emergent Deceptive Behaviors in LLMsCharacterizing Deception Across DomainsUnderstanding Factors Affecting DeceptionEvaluating LLM Alignment Systematically evaluating deceptive tendencies in LLMsAnalyzing behavioral patterns of deceptionInvestigating the impact of extrinsic factors on deceptionProviding a benchmark for AI deception research

Datasets & Benchmarks

Datasets

DeceptionBench (benchmark with >1,000 samples)

Benchmarks

DeceptionBench

Related Fields

AI SafetyAI EthicsNatural Language ProcessingMachine LearningPsychology (Behavioral Analysis)

Keywords

AI DeceptionLLM BehaviorAI SafetyAI EthicsBenchmarkRobustnessAlignmentSocietal DomainsDeceptive TendenciesLarge Language Models

Academic Context

#AI Safety#AI Ethics#LLM Behavior#Robustness#AI Alignment#Social Impact of AI

Commercial Potential

Potential Products

AI deception detection toolsLLM safety auditing servicesEthical AI development frameworks

Target Industries

TechnologyFinanceHealthcareSocial MediaGovernment

Use Case Examples

Evaluating an AI financial advisor for potential deceptive adviceTesting a healthcare chatbot for truthful and unbiased informationAssessing the trustworthiness of AI systems in social interactions

Competitive Edge

Addresses a critical gap in AI safety research by providing the first comprehensive benchmark specifically for evaluating AI deception across diverse real-world scenarios.

Market Opportunity

Growing market for AI safety, ethics, and trustworthy AI solutions.

Revenue Models

Consulting services for AI ethics and safetydevelopment of AI auditing tools.

Resource Requirements

Compute Needs

Minimal for using the benchmark; significant for evaluating LLMs against the benchmark scenarios.

Data Requirements

Access to LLMs for evaluation against the DeceptionBench scenarios.

Deployment Constraints

Deception can be subtle and context-dependent, making comprehensive evaluation challenging.

Scalability

The benchmark is designed to be scalable for evaluating various LLMs across numerous scenarios.

Regulatory Considerations

Ethical implications of AI deception and the need for transparency.

Production Readiness

Maturity Level

Benchmark/Dataset

Time to Market

Immediate (for using the benchmark)

Patent Potential

Low for the benchmark itself, potentially moderate for novel methods developed to address identified deception issues.

View Full Paper Back to Papers