Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Vision-driven field monitoring is central to digital agriculture, yet models
built on general-domain pretrained backbones often fail to generalize across
tasks, owing to the interaction of fine, variable canopy structures with
fluctuating field conditions. We present FoMo4Wheat, one of the first
crop-domain vision foundation model pretrained with self-supervision on
ImAg4Wheat, the largest and most diverse wheat image dataset to date (2.5
million high-resolution images collected over a decade at 30 global sites,
spanning >2,000 genotypes and >500 environmental conditions). This
wheat-specific pretraining yields representations that are robust for wheat and
transferable to other crops and weeds. Across ten in-field vision tasks at
canopy and organ levels, FoMo4Wheat models consistently outperform
state-of-the-art models pretrained on general-domain dataset. These results
demonstrate the value of crop-specific foundation models for reliable in-field
perception and chart a path toward a universal crop foundation model with
cross-species and cross-task capabilities. FoMo4Wheat models and the ImAg4Wheat
dataset are publicly available online: https://github.com/PheniX-Lab/FoMo4Wheat
and https://huggingface.co/PheniX-Lab/FoMo4Wheat. The demonstration website is:
https://fomo4wheat.phenix-lab.com/.