Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Remote Sensing Visual Question Answering (RSVQA) presents unique challenges
in ensuring that model decisions are both understandable and grounded in visual
content. Current models often suffer from a lack of interpretability and
explainability, as well as from biases in dataset distributions that lead to
shortcut learning. In this work, we tackle these issues by introducing a novel
RSVQA dataset, Chessboard, designed to minimize biases through 3'123'253
questions and a balanced answer distribution. Each answer is linked to one or
more cells within the image, enabling fine-grained visual reasoning.
Building on this dataset, we develop an explainable and interpretable model
called Checkmate that identifies the image cells most relevant to its
decisions. Through extensive experiments across multiple model architectures,
we show that our approach improves transparency and supports more trustworthy
decision-making in RSVQA systems.
Key Contributions
Introduces the Chessboard dataset for RSVQA, designed to minimize bias with a balanced answer distribution and linked image cells for fine-grained reasoning. Develops the Checkmate model, which enhances interpretability and explainability by identifying relevant image cells for its decisions, leading to more trustworthy RSVQA systems.
Business Value
Enables more reliable and understandable analysis of remote sensing imagery, crucial for applications like environmental monitoring, urban planning, and disaster response where understanding model reasoning is critical.