Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper Marine Scientists,Robotics Engineers,Computer Vision Researchers,Defense Analysts 2 days ago

NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding

computer-vision › scene-understanding
📄 Abstract

Abstract: Underwater exploration offers critical insights into our planet and attracts increasing attention for its broader applications in resource exploration, national security, etc. We study the underwater scene understanding methods, which aim to achieve automated underwater exploration. The underwater scene understanding task demands multi-task perceptions from multiple granularities. However, the absence of large-scale underwater multi-task instruction-tuning datasets hinders the progress of this research. To bridge this gap, we construct NautData, a dataset containing 1.45 M image-text pairs supporting eight underwater scene understanding tasks. It enables the development and thorough evaluation of the underwater scene understanding models. Underwater image degradation is a widely recognized challenge that interferes with underwater tasks. To improve the robustness of underwater scene understanding, we introduce physical priors derived from underwater imaging models and propose a plug-and-play vision feature enhancement (VFE) module, which explicitly restores clear underwater information. We integrate this module into renowned baselines LLaVA-1.5 and Qwen2.5-VL and build our underwater LMM, NAUTILUS. Experiments conducted on the NautData and public underwater datasets demonstrate the effectiveness of the VFE module, consistently improving the performance of both baselines on the majority of supported tasks, thus ensuring the superiority of NAUTILUS in the underwater scene understanding area. Data and models are available at https://github.com/H-EmbodVis/NAUTILUS.
Authors (7)
Wei Xu
Cheng Wang
Dingkang Liang
Zongchuang Zhao
Xingyu Jiang
Peng Zhang
+1 more
Submitted
October 31, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

NAUTILUS addresses the challenge of underwater scene understanding by introducing NautData, a large-scale dataset (1.45M image-text pairs) for instruction tuning. It also incorporates physical priors from underwater imaging models to improve robustness against image degradation, enabling better multi-task perception for automated underwater exploration.

Business Value

Enables more effective and automated exploration and monitoring of underwater environments, supporting scientific research, resource management, and security operations.