Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Egocentric human-object interaction (Ego-HOI) detection is crucial for
intelligent agents to understand and assist human activities from a
first-person perspective. However, progress has been hindered by the lack of
benchmarks and methods tailored to egocentric challenges such as severe
hand-object occlusion. In this paper, we introduce the real-world Ego-HOI
detection task and the accompanying Ego-HOIBench, a new dataset with over 27K
egocentric images and explicit, fine-grained hand-verb-object triplet
annotations across 123 categories. Ego-HOIBench covers diverse daily scenarios,
object types, and both single- and two-hand interactions, offering a
comprehensive testbed for Ego-HOI research. Benchmarking existing third-person
HOI detectors on Ego-HOIBench reveals significant performance gaps,
highlighting the need for egocentric-specific solutions. To this end, we
propose Hand Geometry and Interactivity Refinement (HGIR), a lightweight,
plug-and-play scheme that leverages hand pose and geometric cues to enhance
interaction representations. Specifically, HGIR explicitly extracts global hand
geometric features from the estimated hand pose proposals, and further refines
interaction features through pose-interaction attention, enabling the model to
focus on subtle hand-object relationship differences even under severe
occlusion. HGIR significantly improves Ego-HOI detection performance across
multiple baselines, achieving new state-of-the-art results on Ego-HOIBench. Our
dataset and method establish a solid foundation for future research in
egocentric vision and human-object interaction understanding. Project page:
https://dengkunyuan.github.io/EgoHOIBench/