Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Open-world egocentric activity recognition poses a fundamental challenge due
to its unconstrained nature, requiring models to infer unseen activities from
an expansive, partially observed search space. We introduce ProbRes, a
Probabilistic Residual search framework based on jump-diffusion that
efficiently navigates this space by balancing prior-guided exploration with
likelihood-driven exploitation. Our approach integrates structured commonsense
priors to construct a semantically coherent search space, adaptively refines
predictions using Vision-Language Models (VLMs) and employs a stochastic search
mechanism to locate high-likelihood activity labels while minimizing exhaustive
enumeration efficiently. We systematically evaluate ProbRes across multiple
openness levels (L0-L3), demonstrating its adaptability to increasing search
space complexity. In addition to achieving state-of-the-art performance on
benchmark datasets (GTEA Gaze, GTEA Gaze+, EPIC-Kitchens, and Charades-Ego), we
establish a clear taxonomy for open-world recognition, delineating the
challenges and methodological advancements necessary for egocentric activity
understanding. Our results highlight the importance of structured search
strategies, paving the way for scalable and efficient open-world activity
recognition.
Key Contributions
ProbRes introduces a novel probabilistic jump-diffusion search framework for open-world egocentric activity recognition. It effectively balances exploration and exploitation using commonsense priors and VLM refinement, enabling efficient inference of unseen activities from large, partially observed search spaces and achieving state-of-the-art performance.
Business Value
Enables more intelligent and adaptable AI systems for applications like assistive robotics, personalized user interfaces, and advanced surveillance by understanding human activities in real-world, unconstrained environments.