Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
π Abstract
Abstract: Implicit neural representations (INR) have gained prominence for efficiently
encoding multimedia data, yet their applications in audio signals remain
limited. This study introduces the Kolmogorov-Arnold Network (KAN), a novel
architecture using learnable activation functions, as an effective INR model
for audio representation. KAN demonstrates superior perceptual performance over
previous INRs, achieving the lowest Log-SpectralDistance of 1.29 and the
highest Perceptual Evaluation of Speech Quality of 3.57 for 1.5 s audio. To
extend KAN's utility, we propose FewSound, a hypernetwork-based architecture
that enhances INR parameter updates. FewSound outperforms the state-of-the-art
HyperSound, with a 33.3% improvement in MSE and 60.87% in SI-SNR. These results
show KAN as a robust and adaptable audio representation with the potential for
scalability and integration into various hypernetwork frameworks. The source
code can be accessed at https://github.com/gmum/fewsound.git.
Authors (5)
Patryk MarszaΕek
Maciej Rut
Piotr Kawa
PrzemysΕaw Spurek
Piotr Syga
Key Contributions
Introduces the Kolmogorov-Arnold Network (KAN) as a novel implicit neural representation for audio, demonstrating superior perceptual performance over previous INRs. Proposes FewSound, a hypernetwork-based architecture that enhances KAN's utility by improving INR parameter updates, achieving state-of-the-art results in audio representation tasks.
Business Value
Enables more efficient and higher-quality audio generation and representation, with potential applications in text-to-speech systems, audio compression, and music generation. Improved perceptual quality leads to better user experiences.