Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 91% Match Research Paper Robotics Researchers,Computer Vision Engineers,AI Researchers,HCI Designers 1 week ago

H2OFlow: Grounding Human-Object Affordances with 3D Generative Models and Dense Diffused Flows

robotics › human-robot-interaction
📄 Abstract

Abstract: Understanding how humans interact with the surrounding environment, and specifically reasoning about object interactions and affordances, is a critical challenge in computer vision, robotics, and AI. Current approaches often depend on labor-intensive, hand-labeled datasets capturing real-world or simulated human-object interaction (HOI) tasks, which are costly and time-consuming to produce. Furthermore, most existing methods for 3D affordance understanding are limited to contact-based analysis, neglecting other essential aspects of human-object interactions, such as orientation (\eg, humans might have a preferential orientation with respect certain objects, such as a TV) and spatial occupancy (\eg, humans are more likely to occupy certain regions around an object, like the front of a microwave rather than its back). To address these limitations, we introduce \emph{H2OFlow}, a novel framework that comprehensively learns 3D HOI affordances -- encompassing contact, orientation, and spatial occupancy -- using only synthetic data generated from 3D generative models. H2OFlow employs a dense 3D-flow-based representation, learned through a dense diffusion process operating on point clouds. This learned flow enables the discovery of rich 3D affordances without the need for human annotations. Through extensive quantitative and qualitative evaluations, we demonstrate that H2OFlow generalizes effectively to real-world objects and surpasses prior methods that rely on manual annotations or mesh-based representations in modeling 3D affordance.
Authors (2)
Harry Zhang
Luca Carlone
Submitted
October 17, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

H2OFlow is a novel framework that comprehensively learns 3D Human-Object Interaction (HOI) affordances, including contact, orientation, and spatial occupancy. It utilizes 3D generative models and dense diffused flows, overcoming limitations of existing methods that focus solely on contact and rely on costly, hand-labeled datasets.

Business Value

Enables robots and AI systems to better understand and predict human actions around objects, leading to safer human-robot collaboration, more intuitive interfaces, and realistic virtual environments.