Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 88% Match Research Paper AI Researchers,Computer Vision Engineers,Robotics Developers,Cognitive Scientists 1 week ago

CAVE: Detecting and Explaining Commonsense Anomalies in Visual Environments

computer-vision › scene-understanding
📄 Abstract

Abstract: Humans can naturally identify, reason about, and explain anomalies in their environment. In computer vision, this long-standing challenge remains limited to industrial defects or unrealistic, synthetically generated anomalies, failing to capture the richness and unpredictability of real-world anomalies. In this work, we introduce CAVE, the first benchmark of real-world visual anomalies. CAVE supports three open-ended tasks: anomaly description, explanation, and justification; with fine-grained annotations for visual grounding and categorizing anomalies based on their visual manifestations, their complexity, severity, and commonness. These annotations draw inspiration from cognitive science research on how humans identify and resolve anomalies, providing a comprehensive framework for evaluating Vision-Language Models (VLMs) in detecting and understanding anomalies. We show that state-of-the-art VLMs struggle with visual anomaly perception and commonsense reasoning, even with advanced prompting strategies. By offering a realistic and cognitively grounded benchmark, CAVE serves as a valuable resource for advancing research in anomaly detection and commonsense reasoning in VLMs.
Authors (6)
Rishika Bhagwatkar
Syrielle Montariol
Angelika Romanou
Beatriz Borges
Irina Rish
Antoine Bosselut
Submitted
October 29, 2025
arXiv Category
cs.CV
2025 Conference on Empirical Methods in Natural Language Processing
arXiv PDF

Key Contributions

CAVE introduces the first benchmark for real-world visual anomalies, supporting tasks like description, explanation, and justification. It provides fine-grained annotations inspired by cognitive science, enabling a comprehensive evaluation of Vision-Language Models (VLMs) in detecting and understanding anomalies, revealing current struggles in visual anomaly perception and commonsense reasoning.

Business Value

Enables the development of more robust and intelligent AI systems capable of understanding and reacting to unexpected situations in real-world environments, crucial for safety and reliability.