Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: While diffusion models advance text-to-motion generation, their static
semantic conditioning ignores temporal-frequency demands: early denoising
requires structural semantics for motion foundations while later stages need
localized details for text alignment. This mismatch mirrors biological
morphogenesis where developmental phases demand distinct genetic programs.
Inspired by epigenetic regulation governing morphological specialization, we
propose **(ANT)**, an **A**daptive **N**eural **T**emporal-Aware architecture.
ANT orchestrates semantic granularity through: **(i) Semantic Temporally
Adaptive (STA) Module:** Automatically partitions denoising into low-frequency
structural planning and high-frequency refinement via spectral analysis. **(ii)
Dynamic Classifier-Free Guidance scheduling (DCFG):** Adaptively adjusts
conditional to unconditional ratio enhancing efficiency while maintaining
fidelity. Extensive experiments show that ANT can be applied to various
baselines, significantly improving model performance, and achieving
state-of-the-art semantic alignment on StableMoFusion.