Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Today's scene graph generation (SGG) task is still far from practical, mainly
due to the severe training bias, e.g., collapsing diverse "human walk on / sit
on / lay on beach" into "human on beach". Given such SGG, the down-stream tasks
such as VQA can hardly infer better scene structures than merely a bag of
objects. However, debiasing in SGG is not trivial because traditional debiasing
methods cannot distinguish between the good and bad bias, e.g., good context
prior (e.g., "person read book" rather than "eat") and bad long-tailed bias
(e.g., "near" dominating "behind / in front of"). In this paper, we present a
novel SGG framework based on causal inference but not the conventional
likelihood. We first build a causal graph for SGG, and perform traditional
biased training with the graph. Then, we propose to draw the counterfactual
causality from the trained graph to infer the effect from the bad bias, which
should be removed. In particular, we use Total Direct Effect (TDE) as the
proposed final predicate score for unbiased SGG. Note that our framework is
agnostic to any SGG model and thus can be widely applied in the community who
seeks unbiased predictions. By using the proposed Scene Graph Diagnosis toolkit
on the SGG benchmark Visual Genome and several prevailing models, we observed
significant improvements over the previous state-of-the-art methods.
Authors (5)
Kaihua Tang
Yulei Niu
Jianqiang Huang
Jiaxin Shi
Hanwang Zhang
Submitted
February 27, 2020
Key Contributions
This paper introduces a novel framework for unbiased scene graph generation (SGG) by leveraging causal inference. It addresses the critical issue of training bias in SGG, which hinders downstream tasks like VQA, by distinguishing between beneficial context priors and detrimental long-tailed biases. The proposed method uses a causal graph to perform biased training and then applies counterfactual causality (Total Direct Effect) to remove the negative effects of bad bias, leading to more accurate scene structure inference.
Business Value
Improved accuracy in image understanding systems can lead to better performance in applications like autonomous driving, content moderation, and visual search, by enabling more reliable interpretation of visual scenes.