Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Monocular 3D human pose estimation remains a fundamentally ill-posed inverse
problem due to the inherent depth ambiguity in 2D-to-3D lifting. While
contemporary video-based methods leverage temporal context to enhance spatial
reasoning, they operate under a critical paradigm limitation: processing each
sequence in isolation, thereby failing to exploit the strong structural
regularities and repetitive motion patterns that pervade human movement across
sequences. This work introduces the Pattern Reuse Graph Convolutional Network
(PRGCN), a novel framework that formalizes pose estimation as a problem of
pattern retrieval and adaptation. At its core, PRGCN features a graph memory
bank that learns and stores a compact set of pose prototypes, encoded as
relational graphs, which are dynamically retrieved via an attention mechanism
to provide structured priors. These priors are adaptively fused with hard-coded
anatomical constraints through a memory-driven graph convolution, ensuring
geometrical plausibility. To underpin this retrieval process with robust
spatiotemporal features, we design a dual-stream hybrid architecture that
synergistically combines the linear-complexity, local temporal modeling of
Mamba-based state-space models with the global relational capacity of
self-attention. Extensive evaluations on Human3.6M and MPI-INF-3DHP benchmarks
demonstrate that PRGCN establishes a new state-of-the-art, achieving an MPJPE
of 37.1mm and 13.4mm, respectively, while exhibiting enhanced cross-domain
generalization capability. Our work posits that the long-overlooked mechanism
of cross-sequence pattern reuse is pivotal to advancing the field, shifting the
paradigm from per-sequence optimization towards cumulative knowledge learning.
Authors (5)
Zhuoyang Xie
Yibo Zhao
Hui Huang
Riwei Wang
Zan Gao
Submitted
October 22, 2025
Key Contributions
Introduces the Pattern Reuse Graph Convolutional Network (PRGCN), a novel framework that addresses the limitation of processing sequences in isolation for 3D pose estimation. It features a graph memory bank to store and retrieve pose prototypes, which are adaptively fused with anatomical constraints to improve accuracy and exploit cross-sequence motion regularities.
Business Value
Enables more accurate and robust 3D human pose estimation from monocular video, which is critical for applications like motion capture for animation, virtual reality, human-robot interaction, and sports analytics. This can lead to more realistic digital avatars and better understanding of human movement.