Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 94% Match Research Paper Computer Vision Researchers,Robotics Engineers,3D Graphics Developers,AI Scientists 2 weeks ago

DeepDetect: Learning All-in-One Dense Keypoints

computer-vision › object-detection
📄 Abstract

Abstract: Keypoint detection is the foundation of many computer vision tasks, including image registration, structure-from motion, 3D reconstruction, visual odometry, and SLAM. Traditional detectors (SIFT, SURF, ORB, BRISK, etc.) and learning based methods (SuperPoint, R2D2, LF-Net, D2-Net, etc.) have shown strong performance yet suffer from key limitations: sensitivity to photometric changes, low keypoint density and repeatability, limited adaptability to challenging scenes, and lack of semantic understanding, often failing to prioritize visually important regions. We present DeepDetect, an intelligent, all-in-one, dense keypoint detector that unifies the strengths of classical detectors using deep learning. Firstly, we create ground-truth masks by fusing outputs of 7 keypoint and 2 edge detectors, extracting diverse visual cues from corners and blobs to prominent edges and textures in the images. Afterwards, a lightweight and efficient model: ESPNet, is trained using these masks as labels, enabling DeepDetect to focus semantically on images while producing highly dense keypoints, that are adaptable to diverse and visually degraded conditions. Evaluations on the Oxford Affine Covariant Regions dataset demonstrate that DeepDetect surpasses other detectors in keypoint density, repeatability, and the number of correct matches, achieving maximum values of 0.5143 (average keypoint density), 0.9582 (average repeatability), and 59,003 (correct matches).
Authors (2)
Shaharyar Ahmed Khan Tareen
Filza Khan Tareen
Submitted
October 20, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Introduces DeepDetect, an 'all-in-one' dense keypoint detector that unifies strengths of classical and learning-based methods using deep learning. It generates diverse visual cues by fusing outputs from multiple detectors and trains an efficient ESPNet model to achieve high density and repeatability, addressing limitations of prior methods.

Business Value

Improves the foundational capabilities for many 3D vision tasks, enabling more robust and accurate 3D reconstruction, localization, and mapping for applications like autonomous navigation, AR, and robotics.