arxiv_ml 70% Match Research Paper Plant breeders,Agricultural researchers,Bioinformaticians,Computational biologists 3 weeks ago

Biology-informed neural networks learn nonlinear representations from omics data to improve genomic prediction and interpretability

robotics › navigation

📄 Abstract

Abstract: We extend biologically-informed neural networks (BINNs) for genomic prediction (GP) and selection (GS) in crops by integrating thousands of single-nucleotide polymorphisms (SNPs) with multi-omics measurements and prior biological knowledge. Traditional genotype-to-phenotype (G2P) models depend heavily on direct mappings that achieve only modest accuracy, forcing breeders to conduct large, costly field trials to maintain or marginally improve genetic gain. Models that incorporate intermediate molecular phenotypes such as gene expression can achieve higher predictive fit, but they remain impractical for GS since such data are unavailable at deployment or design time. BINNs overcome this limitation by encoding pathway-level inductive biases and leveraging multi-omics data only during training, while using genotype data alone during inference. Applied to maize gene-expression and multi-environment field-trial data, BINN improves rank-correlation accuracy by up to 56% within and across subpopulations under sparse-data conditions and nonlinearly identifies genes that GWAS/TWAS fail to uncover. With complete domain knowledge for a synthetic metabolomics benchmark, BINN reduces prediction error by 75% relative to conventional neural nets and correctly identifies the most important nonlinear pathway. Importantly, both cases show highly sensitive BINN latent variables correlate with the experimental quantities they represent, despite not being trained on them. This suggests BINNs learn biologically-relevant representations, nonlinear or linear, from genotype to phenotype. Together, BINNs establish a framework that leverages intermediate domain information to improve genomic prediction accuracy and reveal nonlinear biological relationships that can guide genomic selection, candidate gene selection, pathway enrichment, and gene-editing prioritization.

Authors (4)

Katiana Kontolati

Rini Jasmine Gladstone

Ian Davis

Ethan Pickering

Submitted

October 16, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Extends Biology-Informed Neural Networks (BINNs) for genomic prediction by integrating SNPs, multi-omics data, and biological knowledge. BINNs encode pathway-level inductive biases, allowing them to use multi-omics data during training while relying solely on genotype data for inference, thus improving accuracy and interpretability.

Business Value

Accelerates crop breeding programs by enabling more accurate prediction of desirable traits, leading to faster development of improved crop varieties with higher yields and resilience.

Paper Metadata

Innovation Type

Algorithmic Innovation / Domain Adaptation

Deployment Feasibility

High for breeding programs. The model uses only genotype data at inference, which is readily available.

Limitations Addressed

Modest accuracy of traditional G2P models, impracticality of using intermediate molecular phenotypes (like gene expression) at deployment time for Genomic Selection.

Performance Gains

Improves rank-correlation accuracy by up to 56% within and across environments.

Technical Tags

Biology-Informed Neural Networks (BINNs)Genomic Prediction (GP)Genomic Selection (GS)Multi-omics DataSNP DataGene ExpressionInductive BiasesGenotype-to-Phenotype (G2P)InterpretabilityMaize Breeding

Research Topics

Computational BiologyGenomicsMachine Learning in BiologyPlant BreedingBioinformatics

Methods & Architectures

Biology-Informed Neural Networks (BINNs)Integration of SNPs and multi-omics dataEncoding pathway-level inductive biasesUsing genotype data alone during inference Biology-Informed Neural Networks (BINNs)

Applications & Tasks

Agriculture Plant Breeding Genomics Biotechnology Improving accuracy of genomic predictionEnhancing interpretability of G2P modelsReducing reliance on costly field trialsIntegrating diverse biological data Genomic prediction and selectionLearning nonlinear representations from omics dataImproving genetic gain in crops

Datasets & Benchmarks

Datasets

Maize gene-expression data, Multi-environment field-trial data

Rank-correlation accuracy

Related Fields

GeneticsPlant ScienceMachine LearningBioinformaticsComputational Biology

Keywords

Genomic PredictionGenomic SelectionBiology-Informed Neural NetworksMulti-omicsSNPsGene ExpressionPlant BreedingMaizeInterpretabilityG2PInductive Bias

Academic Context

#Computational Biology#Genomics#Machine Learning in Biology#Plant Breeding#Bioinformatics

Commercial Potential

Potential Products

Genomic prediction software for crop breedingAI-powered breeding platforms

Target Industries

AgricultureBiotechnologySeed Companies

Use Case Examples

Predicting yield potential in new crop varietiesSelecting for disease resistance using genomic dataAccelerating the development of climate-resilient crops

Competitive Edge

Offers higher accuracy and better interpretability than traditional G2P models by incorporating biological knowledge and multi-omics data during training.

Market Opportunity

Large market for agricultural technology and improved crop varieties.

Revenue Models

Licensing of technology to seed companiesproviding prediction services.

Resource Requirements

Compute Needs

Requires significant computational resources for training complex BINNs with multi-omics data.

Data Requirements

Requires SNP data, multi-omics data (e.g., gene expression), phenotype data, and biological pathway information.

Deployment Constraints

Availability and quality of multi-omics data for training.,Need for curated biological pathway information.

Scalability

Scalability depends on the size of the genomic data and the complexity of the biological networks modeled.

Production Readiness

Maturity Level

Research/Development

Time to Market

3-5 years

Patent Potential

Moderate, for the specific BINNs architecture and application to genomic prediction.

View Full Paper Back to Papers