arxiv_cl 88% Match Research Paper AI Researchers,Computer Vision Engineers,Robotics Developers,Cognitive Scientists 1 week ago

CAVE: Detecting and Explaining Commonsense Anomalies in Visual Environments

computer-vision › scene-understanding

📄 Abstract

Abstract: Humans can naturally identify, reason about, and explain anomalies in their environment. In computer vision, this long-standing challenge remains limited to industrial defects or unrealistic, synthetically generated anomalies, failing to capture the richness and unpredictability of real-world anomalies. In this work, we introduce CAVE, the first benchmark of real-world visual anomalies. CAVE supports three open-ended tasks: anomaly description, explanation, and justification; with fine-grained annotations for visual grounding and categorizing anomalies based on their visual manifestations, their complexity, severity, and commonness. These annotations draw inspiration from cognitive science research on how humans identify and resolve anomalies, providing a comprehensive framework for evaluating Vision-Language Models (VLMs) in detecting and understanding anomalies. We show that state-of-the-art VLMs struggle with visual anomaly perception and commonsense reasoning, even with advanced prompting strategies. By offering a realistic and cognitively grounded benchmark, CAVE serves as a valuable resource for advancing research in anomaly detection and commonsense reasoning in VLMs.

Authors (6)

Rishika Bhagwatkar

Syrielle Montariol

Angelika Romanou

Beatriz Borges

Irina Rish

Antoine Bosselut

Submitted

October 29, 2025

arXiv Category

cs.CV

2025 Conference on Empirical Methods in Natural Language Processing

arXiv PDF

Key Contributions

CAVE introduces the first benchmark for real-world visual anomalies, supporting tasks like description, explanation, and justification. It provides fine-grained annotations inspired by cognitive science, enabling a comprehensive evaluation of Vision-Language Models (VLMs) in detecting and understanding anomalies, revealing current struggles in visual anomaly perception and commonsense reasoning.

Business Value

Enables the development of more robust and intelligent AI systems capable of understanding and reacting to unexpected situations in real-world environments, crucial for safety and reliability.

Paper Metadata

Innovation Type

Dataset / Benchmark

Deployment Feasibility

The benchmark itself is a resource; deploying systems evaluated on it depends on the specific VLM and application.

Limitations Addressed

Limited scope of existing anomaly detection benchmarks (industrial defects, synthetic anomalies),Lack of evaluation for commonsense reasoning in visual anomaly detection,Difficulty in capturing richness and unpredictability of real-world anomalies

Technical Tags

Visual anomaly detectionCommonsense reasoningVision-Language Models (VLMs)Benchmark datasetReal-world anomaliesAnomaly explanationFine-grained annotationCognitive science

Research Topics

Computer VisionCommonsense ReasoningAnomaly DetectionVision-Language ModelsHuman-AI Interaction

Methods & Architectures

Benchmark dataset creationFine-grained annotationEvaluation of VLMsAnomaly description, explanation, and justification tasks Vision-Language Models (VLMs)

Applications & Tasks

Robotics Autonomous Systems Quality Control Surveillance Human-AI Collaboration Detecting and explaining real-world visual anomaliesEvaluating commonsense reasoning in VLMsBridging the gap between synthetic and real-world anomalies Describing visual anomaliesExplaining why something is anomalousJustifying anomaly classifications

Datasets & Benchmarks

Datasets

CAVE benchmark

Related Fields

Artificial IntelligenceComputer VisionCognitive ScienceHuman-Computer InteractionRobotics

Keywords

anomaly detectionvisual anomaliescommonsense reasoningvision-language modelsbenchmarkreal-worldexplanationVLMcomputer visioncognitive science

Academic Context

#Computer Vision#Commonsense Reasoning#Anomaly Detection#Vision-Language Models#Human-AI Interaction

Commercial Potential

Potential Products

AI systems for autonomous driving that can handle unexpected road events.Robots that can identify and report unusual situations in homes or workplaces.Advanced quality control systems that detect subtle defects.

Target Industries

AutomotiveRoboticsManufacturingSecurityHealthcare

Use Case Examples

A robot in a warehouse identifying an unusual object blocking a pathway.An autonomous vehicle detecting a pedestrian behaving erratically.Explaining why a particular scene is considered 'odd' or 'out of place'.

Competitive Edge

Establishes a new standard for evaluating commonsense reasoning in visual anomaly detection, pushing the field beyond simpler anomaly detection tasks.

Market Opportunity

Growing demand for AI that can handle real-world unpredictability.

Revenue Models

N/A (benchmark paper)

Resource Requirements

Compute Needs

Varies depending on the VLM used for evaluation.

Data Requirements

Requires the CAVE benchmark dataset.

Deployment Constraints

Real-world deployment requires robust VLMs capable of nuanced reasoning.

Scalability

Scalability depends on the underlying VLM architecture.

Production Readiness

Maturity Level

Research / Benchmark

Time to Market

3-5 years for robust systems

View Full Paper Back to Papers