arxiv_ml 70% Match Research Paper Neuroscientists,Biomedical Engineers,Machine Learning Researchers,AI Developers in Healthcare 1 week ago

REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects

generative-ai › diffusion

📄 Abstract

Abstract: Foundation models have transformed AI by reducing reliance on task-specific data through large-scale pretraining. While successful in language and vision, their adoption in EEG has lagged due to the heterogeneity of public datasets, which are collected under varying protocols, devices, and electrode configurations. Existing EEG foundation models struggle to generalize across these variations, often restricting pretraining to a single setup, resulting in suboptimal performance, in particular under linear probing. We present REVE (Representation for EEG with Versatile Embeddings), a pretrained model explicitly designed to generalize across diverse EEG signals. REVE introduces a novel 4D positional encoding scheme that enables it to process signals of arbitrary length and electrode arrangement. Using a masked autoencoding objective, we pretrain REVE on over 60,000 hours of EEG data from 92 datasets spanning 25,000 subjects, representing the largest EEG pretraining effort to date. REVE achieves state-of-the-art results on 10 downstream EEG tasks, including motor imagery classification, seizure detection, sleep staging, cognitive load estimation, and emotion recognition. With little to no fine-tuning, it demonstrates strong generalization, and nuanced spatio-temporal modeling. We release code, pretrained weights, and tutorials to support standardized EEG research and accelerate progress in clinical neuroscience.

Authors (8)

Yassine El Ouahidi

Jonathan Lys

Philipp Thölke

Nicolas Farrugia

Bastien Pasdeloup

Vincent Gripon

+2 more

Submitted

October 24, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Presents REVE, a foundation model for EEG pretrained on a massive scale (60,000+ hours, 92 datasets, 25,000 subjects) to address the heterogeneity challenge in EEG data. It introduces a novel 4D positional encoding for handling arbitrary signal lengths and electrode configurations, significantly improving generalization.

Business Value

Accelerates research and development of AI applications for EEG analysis, enabling more robust diagnostic tools and brain-computer interfaces across different clinical and research settings.

Paper Metadata

Innovation Type

Foundation Model Architecture and Pretraining Strategy

Deployment Feasibility

Moderate. Requires significant computational resources for pretraining; inference deployment depends on specific downstream tasks.

Limitations Addressed

Poor generalization of existing EEG foundation models across diverse collection protocols, devices, and electrode configurations.

Performance Gains

Improved generalization and performance under linear probing compared to existing EEG foundation models.

Technical Tags

Foundation ModelsElectroencephalography (EEG)Large-Scale PretrainingHeterogeneityGeneralizationPositional EncodingMasked AutoencodingEEG DatasetsSubject VariabilityLinear Probing

Research Topics

Foundation ModelsBiomedical Signal ProcessingMachine Learning for HealthcareRepresentation LearningTransfer Learning

Methods & Architectures

Large-scale pretrainingMasked autoencoding objectiveNovel 4D positional encoding scheme Transformer-based models (implied by foundation model and positional encoding)

Applications & Tasks

Neuroscience Healthcare Medical Diagnostics Representation LearningGeneralizationSignal Processing Creating a versatile EEG representation modelEnabling generalization across diverse EEG setupsAdapting to arbitrary signal lengths and electrode arrangements

Datasets & Benchmarks

Datasets

92 datasets spanning 25,000 subjects

Performance under linear probingGeneralization across different EEG setups

Related Fields

NeuroscienceBiomedical EngineeringMachine LearningSignal ProcessingArtificial Intelligence

Keywords

Foundation ModelsEEGElectroencephalographyPretrainingGeneralizationHeterogeneityPositional EncodingMasked AutoencodingNeuroscienceMedical AISignal ProcessingRepresentation LearningTransfer LearningBrain-Computer Interfaces

Academic Context

#Foundation Models#Biomedical Signal Processing#Machine Learning for Healthcare#Representation Learning#Transfer Learning

Commercial Potential

Potential Products

General-purpose EEG analysis modelsFoundation models for brain-computer interfacesTools for analyzing diverse EEG datasets

Target Industries

HealthcareMedical ResearchBiotechnologyNeurotechnology

Use Case Examples

Developing robust seizure detection systemsCreating personalized brain-computer interfacesAnalyzing sleep patterns from EEG dataFacilitating large-scale neuroimaging studies

Competitive Edge

Addresses the critical issue of EEG data heterogeneity, offering a more robust and generalizable foundation model than previous attempts.

Market Opportunity

Growing market for AI in healthcare and neuroscience research.

Revenue Models

Licensing of the foundation modelAPI accessdevelopment of specialized downstream applications.

Resource Requirements

Compute Needs

Very High (for pretraining)

Data Requirements

Extremely large and diverse EEG datasets covering various conditions, devices, and protocols.

Deployment Constraints

Requires downstream task fine-tuning; model size might be a constraint for some edge devices.

Scalability

The foundation model itself is scalable due to its architecture and pretraining approach. Downstream task performance depends on fine-tuning data.

Regulatory Considerations

HIPAA compliance for patient datapotential FDA approval for diagnostic applications.

Production Readiness

Maturity Level

Research

Time to Market

Medium to Long

Patent Potential

Moderate (for the 4D positional encoding and pretraining methodology)

View Full Paper Back to Papers