Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper Computer vision researchers,ML engineers,Developers of object detection systems 2 weeks ago

DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection

computer-vision › object-detection
📄 Abstract

Abstract: Open-Vocabulary object detectors can generalize to an unrestricted set of categories through simple textual prompting. However, adapting these models to rare classes or reinforcing their abilities on multiple specialized domains remains essential. While recent methods rely on monolithic adaptation strategies with a single set of weights, we embrace modular deep learning. We introduce DitHub, a framework designed to build and maintain a library of efficient adaptation modules. Inspired by Version Control Systems, DitHub manages expert modules as branches that can be fetched and merged as needed. This modular approach allows us to conduct an in-depth exploration of the compositional properties of adaptation modules, marking the first such study in Object Detection. Our method achieves state-of-the-art performance on the ODinW-13 benchmark and ODinW-O, a newly introduced benchmark designed to assess class reappearance. For more details, visit our project page: https://aimagelab.github.io/DitHub/
Authors (6)
Chiara Cappellino
Gianluca Mancusi
Matteo Mosconi
Angelo Porrello
Simone Calderara
Rita Cucchiara
Submitted
March 12, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Introduces DitHub, a modular framework for building and maintaining adaptation modules for open-vocabulary object detection, inspired by version control systems. This modular approach enables exploration of compositional properties of adaptation modules and achieves state-of-the-art performance on new benchmarks.

Business Value

Enables more flexible and efficient adaptation of object detection models to specific industry needs or new product categories without retraining entire models, leading to faster deployment and reduced costs.