Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 90% Match Research Paper Researchers in deep learning and audio processing,Speech synthesis developers,AI researchers interested in novel architectures 1 day ago

As Good as It KAN Get: High-Fidelity Audio Representation

speech-audio β€Ί audio-generation
πŸ“„ Abstract

Abstract: Implicit neural representations (INR) have gained prominence for efficiently encoding multimedia data, yet their applications in audio signals remain limited. This study introduces the Kolmogorov-Arnold Network (KAN), a novel architecture using learnable activation functions, as an effective INR model for audio representation. KAN demonstrates superior perceptual performance over previous INRs, achieving the lowest Log-SpectralDistance of 1.29 and the highest Perceptual Evaluation of Speech Quality of 3.57 for 1.5 s audio. To extend KAN's utility, we propose FewSound, a hypernetwork-based architecture that enhances INR parameter updates. FewSound outperforms the state-of-the-art HyperSound, with a 33.3% improvement in MSE and 60.87% in SI-SNR. These results show KAN as a robust and adaptable audio representation with the potential for scalability and integration into various hypernetwork frameworks. The source code can be accessed at https://github.com/gmum/fewsound.git.
Authors (5)
Patryk MarszaΕ‚ek
Maciej Rut
Piotr Kawa
PrzemysΕ‚aw Spurek
Piotr Syga
Submitted
March 4, 2025
arXiv Category
cs.SD
arXiv PDF Code

Key Contributions

Introduces the Kolmogorov-Arnold Network (KAN) as a novel implicit neural representation for audio, demonstrating superior perceptual performance over previous INRs. Proposes FewSound, a hypernetwork-based architecture that enhances KAN's utility by improving INR parameter updates, achieving state-of-the-art results in audio representation tasks.

Business Value

Enables more efficient and higher-quality audio generation and representation, with potential applications in text-to-speech systems, audio compression, and music generation. Improved perceptual quality leads to better user experiences.

View Code on GitHub