arxiv_cv 92% Match Research Paper Robotics Researchers,AR/VR Developers,Game Developers,Computer Vision Engineers 2 weeks ago

SHARE: Scene-Human Aligned Reconstruction

computer-vision › 3d-vision

📄 Abstract

Abstract: Animating realistic character interactions with the surrounding environment is important for autonomous agents in gaming, AR/VR, and robotics. However, current methods for human motion reconstruction struggle with accurately placing humans in 3D space. We introduce Scene-Human Aligned REconstruction (SHARE), a technique that leverages the scene geometry's inherent spatial cues to accurately ground human motion reconstruction. Each reconstruction relies solely on a monocular RGB video from a stationary camera. SHARE first estimates a human mesh and segmentation mask for every frame, alongside a scene point map at keyframes. It iteratively refines the human's positions at these keyframes by comparing the human mesh against the human point map extracted from the scene using the mask. Crucially, we also ensure that non-keyframe human meshes remain consistent by preserving their relative root joint positions to keyframe root joints during optimization. Our approach enables more accurate 3D human placement while reconstructing the surrounding scene, facilitating use cases on both curated datasets and in-the-wild web videos. Extensive experiments demonstrate that SHARE outperforms existing methods.

Authors (5)

Joshua Li

Brendan Chharawala

Chang Shu

Xue Bin Peng

Pengcheng Xi

Submitted

October 17, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

SHARE (Scene-Human Aligned REconstruction) leverages scene geometry from monocular RGB video to accurately ground human motion reconstruction in 3D space. It iteratively refines human poses by aligning estimated meshes with scene-derived point maps, ensuring consistency.

Business Value

Enables more realistic character interactions in virtual environments and improves the understanding of human actions for robots, enhancing immersion and utility in AR/VR and robotics.

Paper Metadata

Innovation Type

Algorithmic Improvement

Deployment Feasibility

Feasible for applications using monocular RGB video from stationary cameras. Requires computational resources for mesh and pose estimation.

Limitations Addressed

Inaccurate placement of humans in 3D space by existing human motion reconstruction methods.

Performance Gains

More accurate and consistent 3D grounding of human motion compared to previous methods.

Technical Tags

Human Motion ReconstructionScene GeometryMonocular RGB VideoMesh EstimationSegmentation MaskPoint MapPose AlignmentIterative Refinement3D GroundingStationary Camera

Research Topics

Computer Vision3D ReconstructionHuman Motion AnalysisRoboticsAugmented RealityVirtual Reality

Methods & Architectures

Human mesh estimationSegmentation mask generationScene point map extractionIterative pose refinementAlignment using scene geometry

Applications & Tasks

Robotics Augmented Reality (AR) Virtual Reality (VR) Gaming Animation Human Motion Reconstruction3D Pose EstimationScene-Human Interaction Human Motion Reconstruction3D Pose GroundingScene-Human Alignment

Related Fields

RoboticsComputer GraphicsMotion Capture3D Vision

Keywords

Human Motion Reconstruction3D VisionScene UnderstandingMonocular VideoPose EstimationRoboticsAR/VRMesh ReconstructionSegmentationAlignment

Academic Context

#Computer Vision#3D Reconstruction#Human Motion Analysis#Robotics#Augmented Reality#Virtual Reality

Commercial Potential

Potential Products

3D character animation toolsRobotic perception systemsAR/VR interaction platforms

Target Industries

GamingEntertainmentRoboticsAR/VR

Use Case Examples

Animating virtual characters that interact realistically with virtual environmentsEnabling robots to understand human actions and intentions in shared spacesCreating immersive AR experiences with accurate human presence

Competitive Edge

Achieves more accurate 3D grounding of human motion by explicitly leveraging scene geometry, a novel approach compared to methods that solely focus on human pose.

Resource Requirements

Data Requirements

Requires monocular RGB video data with human subjects and scene information.

Deployment Constraints

Relies on stationary camera setup; performance might degrade with significant camera motion or complex scene dynamics.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers