Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Music-driven dance generation is a challenging task as it requires strict
adherence to genre-specific choreography while ensuring physically realistic
and precisely synchronized dance sequences with the music's beats and rhythm.
Although significant progress has been made in music-conditioned dance
generation, most existing methods struggle to convey specific stylistic
attributes in generated dance. To bridge this gap, we propose a diffusion-based
framework for genre-specific 3D full-body dance generation, conditioned on both
music and descriptive text. To effectively incorporate genre information, we
develop a text-based control mechanism that maps input prompts, either explicit
genre labels or free-form descriptive text, into genre-specific control
signals, enabling precise and controllable text-guided generation of
genre-consistent dance motions. Furthermore, to enhance the alignment between
music and textual conditions, we leverage the features of a music foundation
model, facilitating coherent and semantically aligned dance synthesis. Last, to
balance the objectives of extracting text-genre information and maintaining
high-quality generation results, we propose a novel multi-task optimization
strategy. This effectively balances competing factors such as physical realism,
spatial accuracy, and text classification, significantly improving the overall
quality of the generated sequences. Extensive experimental results obtained on
the FineDance and AIST++ datasets demonstrate the superiority of GCDance over
the existing state-of-the-art approaches.