Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Discrete representation learning has shown promising results across various
domains, including generation and understanding in image, speech and language.
Inspired by these advances, we propose MuseTok, a tokenization method for
symbolic music, and investigate its effectiveness in both music generation and
understanding tasks. MuseTok employs the residual vector quantized-variational
autoencoder (RQ-VAE) on bar-wise music segments within a Transformer-based
encoder-decoder framework, producing music codes that achieve high-fidelity
music reconstruction and accurate understanding of music theory. For
comprehensive evaluation, we apply MuseTok to music generation and semantic
understanding tasks, including melody extraction, chord recognition, and
emotion recognition. Models incorporating MuseTok outperform previous
representation learning baselines in semantic understanding while maintaining
comparable performance in content generation. Furthermore, qualitative analyses
on MuseTok codes, using ground-truth categories and synthetic datasets, reveal
that MuseTok effectively captures underlying musical concepts from large music
collections.
Authors (7)
Jingyue Huang
Zachary Novack
Phillip Long
Yupeng Hou
Ke Chen
Taylor Berg-Kirkpatrick
+1 more
Submitted
October 18, 2025
Key Contributions
MuseTok is a novel tokenization method for symbolic music using RQ-VAE within a Transformer framework, producing music codes for generation and understanding. It achieves high-fidelity reconstruction and accurate music theory understanding, outperforming baselines in semantic understanding tasks while maintaining comparable generation performance.
Business Value
Enables the creation of more sophisticated AI-powered music tools for composers, producers, and listeners, potentially revolutionizing music creation and consumption.