arxiv_ml
Abstract: Abstract: Clarifying the neural basis of speech intelligibility is critical for
computational neuroscience and digital speech processing. Recent neuroimaging
studies have shown that intelligibility modulates cortical activity beyond
simple acoustics,...
#Computational Neuroscience#Speech Processing#Brain-Computer Interfaces#Machine Learning#Neuroimaging
arxiv_ml
Abstract: Abstract: Access to clinical multi-channel EEG remains limited in many regions
worldwide. We present NEUROSKY-EPI, the first open dataset of single-channel,
consumer-grade EEG for epilepsy, collected in a South Asian clinical setting
along with rich ...
#Medical AI#Biomedical Signal Processing#Epilepsy Diagnosis#Patient Stratification#Accessible Health Technology
arxiv_ml
Abstract: Abstract: Early identification of abnormal physiological patterns is essential for the
timely detection of cardiac disease. This work introduces a hybrid
quantum-classical convolutional neural network (QCNN) designed to classify S3
and murmur abnorma...
#Quantum Machine Learning#Biomedical Signal Processing#Cardiology#Pattern Recognition#Hybrid Quantum-Classical Models
arxiv_ml
Abstract: Abstract: Electroencephalography (EEG) and local field potentials (LFP) are two widely
used techniques to record electrical activity from the brain. These signals are
used in both the clinical and research domains for multiple applications.
However, ...
#Automated Brain Signal Preprocessing#Unsupervised Artifact Removal in EEG/LFP#Improving Reproducibility in Neuroscience#Machine Learning for Biomedical Signals
arxiv_ai
Abstract: Abstract: Parallel to rapid advancements in foundation model research, the past few
years have witnessed a surge in music AI applications. As AI-generated and
AI-augmented music become increasingly mainstream, many researchers in the
music AI communi...
#Music AI Research Frontiers#Foundation Models in Music#AI-Generated Music#Model Efficiency and Controllability#Multimodal Music Systems
arxiv_ml
Abstract: Abstract: Early detection of heart arrhythmia can prevent severe future complications
in cardiac patients. While manual diagnosis still remains the clinical
standard, it relies heavily on visual interpretation and is inherently
subjective. In recent ...
#Biomedical Signal Processing#Deep Learning for Healthcare#Time Series Analysis#Medical Diagnosis#Signal Filtering
arxiv_ai
Abstract: Abstract: While generative models for music composition are increasingly capable, their
adoption by musicians is hindered by text-prompting, an asynchronous workflow
disconnected from the embodied, responsive nature of instrumental performance.
To ad...
#Music Generation#Human-AI Interaction#Generative AI#Interactive Systems#Music Technology
arxiv_ai
Abstract: Abstract: Role-play has become a key testbed for generative models, expanding from
text-only dialogue to multimodal interaction. Extending role-play to speech
captures prosody, emotion, and delivery, but also poses new evaluation
challenges. Current ...
#Speech Generation Evaluation#Human-AI Interaction#Benchmark Design#Multimodal AI#Natural Language Processing
arxiv_ai
Abstract: Abstract: Emotion recognition from speech plays a vital role in the development of
empathetic human-computer interaction systems. This paper presents a
comparative analysis of lightweight transformer-based models, DistilHuBERT and
PaSST, by classifyi...
#Speech Processing#Emotion Recognition#Machine Learning#Deep Learning#Human-Computer Interaction
arxiv_cv
Abstract: Abstract: Implicit neural representations (INR) have gained prominence for efficiently
encoding multimedia data, yet their applications in audio signals remain
limited. This study introduces the Kolmogorov-Arnold Network (KAN), a novel
architecture u...
#Audio Representation Learning#Implicit Neural Representations#Generative Models#Deep Learning Architectures#Speech Synthesis
arxiv_cl
Abstract: Abstract: This paper presents KIT's submissions to the IWSLT 2025 low-resource track.
We develop both cascaded systems, consisting of Automatic Speech Recognition
(ASR) and Machine Translation (MT) models, and end-to-end (E2E) Speech
Translation (ST)...
KIT
#Low-resource Speech Translation#Data Augmentation#Model Adaptation#Cross-lingual Transfer#Speech Processing
arxiv_cv
Abstract: Abstract: In the single-positive multi-label (SPML) setting, each image in a dataset is
labeled with the presence of a single class, while the true presence of other
classes remains unknown. The challenge is to narrow the performance gap between
this...
#Machine Learning#Computer Vision (applied to audio)#Data Annotation#Dataset Creation#Multi-label Classification#Audio Analysis
arxiv_cl
Abstract: Abstract: ParlaSpeech is a collection of spoken parliamentary corpora currently
spanning four Slavic languages - Croatian, Czech, Polish and Serbian - all
together 6 thousand hours in size. The corpora were built in an automatic
fashion from the Parl...
#Corpus Linguistics#Speech Processing#Natural Language Processing#Slavic Languages#Linguistic Annotation
arxiv_ml
Abstract: Abstract: Multilingual speech translation (ST) and machine translation (MT) in the
medical domain enhances patient care by enabling efficient communication across
language barriers, alleviating specialized workforce shortages, and
facilitating improv...
#speech processing#machine translation#multilingual AI#medical informatics#low-resource NLP
arxiv_ml
Abstract: Abstract: Audio denoising is critical in signal processing, enhancing intelligibility
and fidelity for applications like restoring musical recordings. This paper
presents a proof-of-concept for adapting a state-of-the-art neural audio codec,
the Desc...
#High-fidelity audio denoising#Adapting neural audio codecs for denoising#Generative audio restoration#Improving intelligibility and fidelity
arxiv_ml
Abstract: Abstract: Everyday speech conveys far more than words, it reflects who we are, how we
feel, and the circumstances surrounding our interactions. Yet, most existing
speech datasets are acted, limited in scale, and fail to capture the expressive
richnes...
#Speech Processing#Natural Language Processing#Artificial Intelligence#Human-Computer Interaction#Affective Computing
arxiv_ml
Abstract: Abstract: Atrial fibrillation (AF) is a leading cause of stroke and mortality,
particularly in elderly patients. Wrist-worn photoplethysmography (PPG) enables
non-invasive, continuous rhythm monitoring, yet suffers from significant
vulnerability to m...
#Biomedical Signal Processing#Machine Learning#Wearable Health Monitoring#Multimodal Learning#Cardiology
arxiv_cv
Abstract: Abstract: Audio-visual speech enhancement (AVSE) is a task that uses visual auxiliary
information to extract a target speaker's speech from mixed audio. In
real-world scenarios, there often exist complex acoustic environments,
accompanied by various ...
#Speech Processing#Multimodal AI#Signal Enhancement#Acoustic Signal Processing#Computer Vision for Audio
arxiv_cv
Abstract: Abstract: Non-contact electrocardiogram (ECG) reconstruction from radar signals offers
a promising approach for unobtrusive cardiac monitoring. We present LifWavNet,
a lifting wavelet network based on a multi-resolution analysis and synthesis
(MRAS) ...
#Biomedical Signal Processing#Non-contact Sensing#Deep Learning for Signal Reconstruction#Cardiac Monitoring
arxiv_ml
Abstract: Abstract: Understanding the structural and cognitive underpinnings of musical
compositions remains a key challenge in music theory and computational
musicology. While traditional methods focus on harmony and rhythm, cognitive
models such as the Impli...
#Computational Musicology#Music Theory#Cognitive Musicology#Machine Learning for Music#Pattern Recognition
arxiv_ml
Abstract: Abstract: Automated monitoring of marine mammals in the St. Lawrence Estuary faces
extreme challenges: calls span low-frequency moans to ultrasonic clicks, often
overlap, and are embedded in variable anthropogenic and environmental noise. We
introduc...
Saguenay St. Lawrence Marine Park Research Station
#Bioacoustics#Signal Processing#Machine Learning for Audio#Marine Mammal Monitoring#Noise Reduction
arxiv_ml
Abstract: Abstract: Towards practical applications of Electroencephalography (EEG), lightweight
acquisition devices garner significant attention. However, EEG channel
selection methods are commonly data-sensitive and cannot establish a unified
sound paradigm f...
#Signal Processing#Biomedical Signal Analysis#Deep Learning#Time Series Analysis#Neuroscience Applications
arxiv_ai
Abstract: Abstract: Brain-to-speech (BTS) systems represent a groundbreaking approach to human
communication by enabling the direct transformation of neural activity into
linguistic expressions. While recent non-invasive BTS studies have largely
focused on dec...
#Brain-Computer Interfaces (BCI)#Speech Synthesis#Neural Signal Decoding#Assistive Technology#Rehabilitation Engineering
arxiv_ai
Abstract: Abstract: Speech Emotion Recognition (SER) is a key affective computing technology that
enables emotionally intelligent artificial intelligence. While SER is
challenging in general, it is particularly difficult for low-resource languages
such as Urdu...
#Affective Computing#Speech Processing#Machine Learning for Low-Resource Languages#Model Generalization#Feature Engineering
arxiv_ai
Abstract: Abstract: Text-to-audio models are a type of generative model that produces audio
output in response to a given textual prompt. Although level generators and the
properties of the functional content that they create (e.g., playability)
dominate most ...
#Generative Audio#Text-to-Audio Synthesis#Content Generation#AI Evaluation#Multimodal AI
arxiv_cv
Abstract: Abstract: The subject of this work is to check how different types of music affect
human emotions. While listening to music, a subjective survey and brain
activity measurements were carried out using an EEG helmet. The aim is to
demonstrate the impac...
#Affective Computing#Music Psychology#Neuroscience#Signal Processing#Human-Computer Interaction
arxiv_cl
Abstract: Abstract: The evaluation of intelligibility for TTS has reached a bottleneck, as
existing assessments heavily rely on word-by-word accuracy metrics such as WER,
which fail to capture the complexity of real-world speech or reflect human
comprehension ...
#Speech Synthesis#Text-to-Speech#Evaluation Metrics#Human-Computer Interaction#Natural Language Processing
arxiv_ai
Abstract: Abstract: Dialogue State Tracking (DST) is a key part of task-oriented dialogue
systems, identifying important information in conversations. However, its
accuracy drops significantly in spoken dialogue environments due to named
entity errors from Aut...
#Improving Robustness of Dialogue Systems#Handling ASR Errors in Dialogue#Data Augmentation for NLP#LLM Applications in Speech Processing#Dialogue State Tracking
arxiv_cl
Abstract: Abstract: Discrete audio representations are gaining traction in speech modeling due to
their interpretability and compatibility with large language models, but are
not always optimized for noisy or real-world environments. Building on existing
works...
#Speech Recognition#Representation Learning#Noise Robustness#Disentangled Representations#Audio Signal Processing
arxiv_cl
Abstract: Abstract: Recent advances in spoken language processing have led to substantial
progress in phonetic tasks such as automatic speech recognition (ASR), phone
recognition (PR), grapheme-to-phoneme conversion (G2P), and phoneme-to-grapheme
conversion (P...
#Spoken Language Processing#Speech Foundation Models#Multitask Learning#Low-Resource NLP#Phonetics