Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Human image animation aims to generate human videos of given characters and
backgrounds that adhere to the desired pose sequence. However, existing methods
focus more on human actions while neglecting the generation of background,
which typically leads to static results or inharmonious movements. The
community has explored camera pose-guided animation tasks, yet preparing the
camera trajectory is impractical for most entertainment applications and
ordinary users. As a remedy, we present an AnimateAnywhere framework, rousing
the background in human image animation without requirements on camera
trajectories. In particular, based on our key insight that the movement of the
human body often reflects the motion of the background, we introduce a
background motion learner (BML) to learn background motions from human pose
sequences. To encourage the model to learn more accurate cross-frame
correspondences, we further deploy an epipolar constraint on the 3D attention
map. Specifically, the mask used to suppress geometrically unreasonable
attention is carefully constructed by combining an epipolar mask and the
current 3D attention map. Extensive experiments demonstrate that our
AnimateAnywhere effectively learns the background motion from human pose
sequences, achieving state-of-the-art performance in generating human animation
results with vivid and realistic backgrounds. The source code and model will be
available at https://github.com/liuxiaoyu1104/AnimateAnywhere.