arxiv_cl 93% Match Research Paper Speech scientists,Bioengineers,ML researchers,Assistive technology developers,Neuroscientists 1 week ago

emg2speech: synthesizing speech from electromyography using self-supervised speech models

speech-audio › text-to-speech

📄 Abstract

Abstract: We present a neuromuscular speech interface that translates electromyographic (EMG) signals collected from orofacial muscles during speech articulation directly into audio. We show that self-supervised speech (SS) representations exhibit a strong linear relationship with the electrical power of muscle action potentials: SS features can be linearly mapped to EMG power with a correlation of $r = 0.85$. Moreover, EMG power vectors corresponding to different articulatory gestures form structured and separable clusters in feature space. This relationship: $\text{SS features}$ $\xrightarrow{\texttt{linear mapping}}$ $\text{EMG power}$ $\xrightarrow{\texttt{gesture-specific clustering}}$ $\text{articulatory movements}$, highlights that SS models implicitly encode articulatory mechanisms. Leveraging this property, we directly map EMG signals to SS feature space and synthesize speech, enabling end-to-end EMG-to-speech generation without explicit articulatory models and vocoder training.

Authors (2)

Harshavardhana T. Gowda

Lee M. Miller

Submitted

October 28, 2025

arXiv Category

cs.SD

arXiv PDF

Key Contributions

This paper presents a neuromuscular speech interface that synthesizes speech directly from electromyographic (EMG) signals of orofacial muscles. It demonstrates a strong linear relationship between self-supervised speech (SS) representations and EMG power, enabling direct mapping of EMG to SS features for end-to-end speech synthesis. This approach bypasses explicit articulatory models and vocoders, offering a novel pathway for communication aids.

Business Value

Enabling individuals who have lost the ability to speak due to motor impairments to communicate effectively through synthesized speech can dramatically improve their quality of life and social integration, opening new markets for assistive communication technologies.

Paper Metadata

Innovation Type

Novel Signal-to-Speech Synthesis Method

Deployment Feasibility

Moderate, requires wearable EMG sensors and sophisticated signal processing, but the core AI component is deployable.

Limitations Addressed

The need for explicit articulatory models and vocoders in traditional speech synthesis pipelines; limitations in communication for individuals with severe motor impairments affecting speech.

Performance Gains

Achieves a correlation of r = 0.85 between SS features and EMG power.

Technical Tags

Electromyography (EMG)Speech SynthesisSelf-Supervised Speech ModelsNeuromuscular InterfaceOrofacial MusclesArticulatory GesturesMuscle Action PotentialsFeature MappingEnd-to-end GenerationSpeech ProstheticsBrain-Computer Interface (BCI)

Research Topics

Speech ProcessingBio-signal ProcessingMachine LearningNeuromuscular InterfacesText-to-Speech SynthesisAssistive Technology

Methods & Architectures

EMG signal acquisitionSelf-supervised speech representation learningLinear mappingGesture-specific clusteringEnd-to-end synthesis Self-supervised speech models

Applications & Tasks

Assistive Communication Speech Prosthetics Neuroscience Research Human-Computer Interaction Synthesizing speech from non-speech signalsDeveloping communication aids for individuals with speech impairmentsUnderstanding the relationship between muscle activity and speech production Speech synthesisEMG-to-speech conversionArticulatory gesture recognition

Related Fields

Speech ProcessingBioengineeringMachine LearningNeuroscienceAssistive TechnologySignal Processing

Keywords

EMGspeech synthesisself-supervised learningneuromuscular interfaceorofacial musclesarticulatory gesturesspeech prosthesisbio-signalstext-to-speechcommunication aidmuscle activity

Academic Context

#Speech Processing#Bio-signal Processing#Machine Learning#Neuromuscular Interfaces#Text-to-Speech Synthesis#Assistive Technology

Commercial Potential

Potential Products

Wearable communication devices for individuals with speech impairmentsAdvanced prosthetic speech systemsTools for studying speech motor control

Target Industries

HealthcareBiotechnologyMedical DevicesAssistive TechnologyRehabilitation

Use Case Examples

A person with ALS or locked-in syndrome using EMG sensors to control a speech synthesizer.Developing a more natural-sounding speech prosthetic for individuals with facial paralysis.Researching the neural control of speech by analyzing muscle activity patterns.

Competitive Edge

This work offers an end-to-end EMG-to-speech synthesis method that leverages self-supervised speech models, bypassing traditional complex articulatory modeling and vocoder training.

Market Opportunity

The assistive technology market, particularly for communication aids, is substantial and growing.

Revenue Models

Sales of assistive deviceslicensing of technology to medical device companies.

Resource Requirements

Compute Needs

Moderate compute for training self-supervised models and inference.

Data Requirements

EMG data synchronized with speech, and large speech corpora for self-supervised pre-training.

Deployment Constraints

Requires wearable EMG sensors and robust signal processing hardware/software.

Scalability

The mapping approach suggests scalability, but sensor technology and signal quality are key.

Regulatory Considerations

Medical device regulations if used as a prostheticdata privacy for bio-signals.

Production Readiness

Maturity Level

Research

Time to Market

4-8 years for clinical applications.

View Full Paper Back to Papers