Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 75% Match Research Paper Machine Learning Researchers,Audio ML Engineers,Ecologists,Bioacousticians,Data Scientists 1 day ago

Merlin L48 Spectrogram Dataset

speech-audio › audio-generation
📄 Abstract

Abstract: In the single-positive multi-label (SPML) setting, each image in a dataset is labeled with the presence of a single class, while the true presence of other classes remains unknown. The challenge is to narrow the performance gap between this partially-labeled setting and fully-supervised learning, which often requires a significant annotation budget. Prior SPML methods were developed and benchmarked on synthetic datasets created by randomly sampling single positive labels from fully-annotated datasets like Pascal VOC, COCO, NUS-WIDE, and CUB200. However, this synthetic approach does not reflect real-world scenarios and fails to capture the fine-grained complexities that can lead to difficult misclassifications. In this work, we introduce the L48 dataset, a fine-grained, real-world multi-label dataset derived from recordings of bird sounds. L48 provides a natural SPML setting with single-positive annotations on a challenging, fine-grained domain, as well as two extended settings in which domain priors give access to additional negative labels. We benchmark existing SPML methods on L48 and observe significant performance differences compared to synthetic datasets and analyze method weaknesses, underscoring the need for more realistic and difficult benchmarks.
Authors (3)
Aaron Sun
Subhransu Maji
Grant Van Horn
Submitted
October 31, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Introduces the L48 dataset, a fine-grained, real-world multi-label dataset derived from bird sound recordings, specifically designed for the Single-Positive Multi-Label (SPML) learning setting. This dataset aims to bridge the gap between synthetic SPML benchmarks and real-world complexities, providing a more challenging and realistic evaluation environment.

Business Value

Enables more accurate and cost-effective biodiversity monitoring and ecological research through improved audio classification models. It also serves as a valuable resource for advancing research in weakly supervised learning.