Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Understanding how humans interact with the surrounding environment, and
specifically reasoning about object interactions and affordances, is a critical
challenge in computer vision, robotics, and AI. Current approaches often depend
on labor-intensive, hand-labeled datasets capturing real-world or simulated
human-object interaction (HOI) tasks, which are costly and time-consuming to
produce. Furthermore, most existing methods for 3D affordance understanding are
limited to contact-based analysis, neglecting other essential aspects of
human-object interactions, such as orientation (\eg, humans might have a
preferential orientation with respect certain objects, such as a TV) and
spatial occupancy (\eg, humans are more likely to occupy certain regions around
an object, like the front of a microwave rather than its back). To address
these limitations, we introduce \emph{H2OFlow}, a novel framework that
comprehensively learns 3D HOI affordances -- encompassing contact, orientation,
and spatial occupancy -- using only synthetic data generated from 3D generative
models. H2OFlow employs a dense 3D-flow-based representation, learned through a
dense diffusion process operating on point clouds. This learned flow enables
the discovery of rich 3D affordances without the need for human annotations.
Through extensive quantitative and qualitative evaluations, we demonstrate that
H2OFlow generalizes effectively to real-world objects and surpasses prior
methods that rely on manual annotations or mesh-based representations in
modeling 3D affordance.
Submitted
October 17, 2025
Key Contributions
H2OFlow is a novel framework that comprehensively learns 3D Human-Object Interaction (HOI) affordances, including contact, orientation, and spatial occupancy. It utilizes 3D generative models and dense diffused flows, overcoming limitations of existing methods that focus solely on contact and rely on costly, hand-labeled datasets.
Business Value
Enables robots and AI systems to better understand and predict human actions around objects, leading to safer human-robot collaboration, more intuitive interfaces, and realistic virtual environments.