Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Scene Graph Generation (SGG) is a task that encodes visual relationships
between objects in images as graph structures. SGG shows significant promise as
a foundational component for downstream tasks, such as reasoning for embodied
agents. To enable real-time applications, SGG must address the trade-off
between performance and inference speed. However, current methods tend to focus
on one of the following: (1) improving relation prediction accuracy, (2)
enhancing object detection accuracy, or (3) reducing latency, without aiming to
balance all three objectives simultaneously. To address this limitation, we
propose the Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene
Graph Generation (REACT) architecture, which achieves the highest inference
speed among existing SGG models, improving object detection accuracy without
sacrificing relation prediction performance. Compared to state-of-the-art
approaches, REACT is 2.7 times faster and improves object detection accuracy by
58\%. Furthermore, our proposal significantly reduces model size, with an
average of 5.5x fewer parameters. The code is available at
https://github.com/Maelic/SGG-Benchmark