📄 Abstract
Abstract: Keypoint detection is the foundation of many computer vision tasks, including
image registration, structure-from motion, 3D reconstruction, visual odometry,
and SLAM. Traditional detectors (SIFT, SURF, ORB, BRISK, etc.) and learning
based methods (SuperPoint, R2D2, LF-Net, D2-Net, etc.) have shown strong
performance yet suffer from key limitations: sensitivity to photometric
changes, low keypoint density and repeatability, limited adaptability to
challenging scenes, and lack of semantic understanding, often failing to
prioritize visually important regions. We present DeepDetect, an intelligent,
all-in-one, dense keypoint detector that unifies the strengths of classical
detectors using deep learning. Firstly, we create ground-truth masks by fusing
outputs of 7 keypoint and 2 edge detectors, extracting diverse visual cues from
corners and blobs to prominent edges and textures in the images. Afterwards, a
lightweight and efficient model: ESPNet, is trained using these masks as
labels, enabling DeepDetect to focus semantically on images while producing
highly dense keypoints, that are adaptable to diverse and visually degraded
conditions. Evaluations on the Oxford Affine Covariant Regions dataset
demonstrate that DeepDetect surpasses other detectors in keypoint density,
repeatability, and the number of correct matches, achieving maximum values of
0.5143 (average keypoint density), 0.9582 (average repeatability), and 59,003
(correct matches).
Authors (2)
Shaharyar Ahmed Khan Tareen
Filza Khan Tareen
Submitted
October 20, 2025
Key Contributions
Introduces DeepDetect, an 'all-in-one' dense keypoint detector that unifies strengths of classical and learning-based methods using deep learning. It generates diverse visual cues by fusing outputs from multiple detectors and trains an efficient ESPNet model to achieve high density and repeatability, addressing limitations of prior methods.
Business Value
Improves the foundational capabilities for many 3D vision tasks, enabling more robust and accurate 3D reconstruction, localization, and mapping for applications like autonomous navigation, AR, and robotics.