arxiv_ml 95% Match Research Paper Computational Biologists,Bioinformaticians,AI Researchers,Genomic Researchers,Drug Discovery Scientists 1 week ago

Discovering Interpretable Biological Concepts in Single-cell RNA-seq Foundation Models

ai-safety › interpretability

📄 Abstract

Abstract: Single-cell RNA-seq foundation models achieve strong performance on downstream tasks but remain black boxes, limiting their utility for biological discovery. Recent work has shown that sparse dictionary learning can extract concepts from deep learning models, with promising applications in biomedical imaging and protein models. However, interpreting biological concepts remains challenging, as biological sequences are not inherently human-interpretable. We introduce a novel concept-based interpretability framework for single-cell RNA-seq models with a focus on concept interpretation and evaluation. We propose an attribution method with counterfactual perturbations that identifies genes that influence concept activation, moving beyond correlational approaches like differential expression analysis. We then provide two complementary interpretation approaches: an expert-driven analysis facilitated by an interactive interface and an ontology-driven method with attribution-based biological pathway enrichment. Applying our framework to two well-known single-cell RNA-seq models from the literature, we interpret concepts extracted by Top-K Sparse Auto-Encoders trained on two immune cell datasets. With a domain expert in immunology, we show that concepts improve interpretability compared to individual neurons while preserving the richness and informativeness of the latent representations. This work provides a principled framework for interpreting what biological knowledge foundation models have encoded, paving the way for their use for hypothesis generation and discovery.

Authors (5)

Charlotte Claye
MICS

Pierre Marschall
MICS

Wassila Ouerdane
MICS

Céline Hudelot
MICS

Julien Duquesne

Institutions

🏛️ MICS

Submitted

October 29, 2025

arXiv Category

q-bio.GN

arXiv PDF

Key Contributions

Introduces a novel concept-based interpretability framework for single-cell RNA-seq foundation models, focusing on extracting and evaluating biological concepts. It proposes an attribution method using counterfactual perturbations to identify influential genes, moving beyond correlational methods, and offers both expert-driven and ontology-driven interpretation approaches.

Business Value

Accelerates biological discovery by making complex genomic models interpretable, potentially leading to faster identification of disease mechanisms, biomarkers, and therapeutic targets.

Paper Metadata

Innovation Type

Interpretability Framework and Methods

Deployment Feasibility

High, as it's a framework for analyzing existing models, but requires specialized biological and computational expertise.

Limitations Addressed

Black-box nature of single-cell RNA-seq foundation models, difficulty in interpreting biological concepts, limitations of correlational approaches like differential expression analysis.

Technical Tags

InterpretabilitySingle-cell RNA-seqFoundation ModelsBiological ConceptsSparse Dictionary LearningAttribution MethodsCounterfactual PerturbationsOntology-driven AnalysisGene Expression

Research Topics

AI InterpretabilityComputational BiologySingle-cell GenomicsFoundation ModelsBiomedical Discovery

Methods & Architectures

Concept-based Interpretability FrameworkAttribution Method with Counterfactual PerturbationsExpert-driven AnalysisOntology-driven MethodSparse Dictionary Learning Foundation ModelsDeep Learning Models

Applications & Tasks

Biomedical Research Genomics Drug Discovery Personalized Medicine Interpreting Biological Concepts in ModelsExtracting Meaningful Insights from Gene Expression Data Biological Concept DiscoveryGene Function IdentificationModel Interpretation

Related Fields

Computational BiologyBioinformaticsMachine LearningArtificial IntelligenceGenomicsInterpretability

Keywords

InterpretabilitySingle-cell RNA-seqFoundation ModelsBiological ConceptsSparse Dictionary LearningAttributionCounterfactualsGene ExpressionGenomicsBiomedical DiscoveryOntologyMachine LearningDeep LearningExplainable AI

Academic Context

#AI Interpretability#Computational Biology#Single-cell Genomics#Foundation Models#Biomedical Discovery

Commercial Potential

Potential Products

AI-powered biological discovery platformsInterpretability tools for genomic modelsBiomarker identification software

Target Industries

BiotechnologyPharmaceuticalsHealthcareResearch Institutions

Use Case Examples

Identifying genes associated with specific cell types or statesUnderstanding the biological basis of model predictions in disease researchDiscovering novel therapeutic targets from genomic data

Competitive Edge

Provides a more biologically grounded and interpretable approach to analyzing foundation models in genomics compared to generic interpretability methods.

Market Opportunity

Growing market for AI in drug discovery and genomics research.

Revenue Models

Licensing of software/frameworkR&D partnershipsconsulting services.

Resource Requirements

Compute Needs

Moderate to High, depending on the size of the foundation models and datasets.

Data Requirements

Single-cell RNA-seq data, associated biological ontologies.

Deployment Constraints

Requires expertise in both machine learning and biology.

Scalability

Scalable to larger foundation models and datasets with sufficient computational resources.

Regulatory Considerations

Ethical considerations in biological research and data usage.

Production Readiness

Maturity Level

Research

Time to Market

2-3 years for integration into research tools.

Patent Potential

Moderate

View Full Paper Back to Papers