Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 70% Match Research Paper Plant breeders,Agricultural researchers,Bioinformaticians,Computational biologists 3 weeks ago

Biology-informed neural networks learn nonlinear representations from omics data to improve genomic prediction and interpretability

robotics › navigation
📄 Abstract

Abstract: We extend biologically-informed neural networks (BINNs) for genomic prediction (GP) and selection (GS) in crops by integrating thousands of single-nucleotide polymorphisms (SNPs) with multi-omics measurements and prior biological knowledge. Traditional genotype-to-phenotype (G2P) models depend heavily on direct mappings that achieve only modest accuracy, forcing breeders to conduct large, costly field trials to maintain or marginally improve genetic gain. Models that incorporate intermediate molecular phenotypes such as gene expression can achieve higher predictive fit, but they remain impractical for GS since such data are unavailable at deployment or design time. BINNs overcome this limitation by encoding pathway-level inductive biases and leveraging multi-omics data only during training, while using genotype data alone during inference. Applied to maize gene-expression and multi-environment field-trial data, BINN improves rank-correlation accuracy by up to 56% within and across subpopulations under sparse-data conditions and nonlinearly identifies genes that GWAS/TWAS fail to uncover. With complete domain knowledge for a synthetic metabolomics benchmark, BINN reduces prediction error by 75% relative to conventional neural nets and correctly identifies the most important nonlinear pathway. Importantly, both cases show highly sensitive BINN latent variables correlate with the experimental quantities they represent, despite not being trained on them. This suggests BINNs learn biologically-relevant representations, nonlinear or linear, from genotype to phenotype. Together, BINNs establish a framework that leverages intermediate domain information to improve genomic prediction accuracy and reveal nonlinear biological relationships that can guide genomic selection, candidate gene selection, pathway enrichment, and gene-editing prioritization.
Authors (4)
Katiana Kontolati
Rini Jasmine Gladstone
Ian Davis
Ethan Pickering
Submitted
October 16, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Extends Biology-Informed Neural Networks (BINNs) for genomic prediction by integrating SNPs, multi-omics data, and biological knowledge. BINNs encode pathway-level inductive biases, allowing them to use multi-omics data during training while relying solely on genotype data for inference, thus improving accuracy and interpretability.

Business Value

Accelerates crop breeding programs by enabling more accurate prediction of desirable traits, leading to faster development of improved crop varieties with higher yields and resilience.