Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Open-Vocabulary object detectors can generalize to an unrestricted set of
categories through simple textual prompting. However, adapting these models to
rare classes or reinforcing their abilities on multiple specialized domains
remains essential. While recent methods rely on monolithic adaptation
strategies with a single set of weights, we embrace modular deep learning. We
introduce DitHub, a framework designed to build and maintain a library of
efficient adaptation modules. Inspired by Version Control Systems, DitHub
manages expert modules as branches that can be fetched and merged as needed.
This modular approach allows us to conduct an in-depth exploration of the
compositional properties of adaptation modules, marking the first such study in
Object Detection. Our method achieves state-of-the-art performance on the
ODinW-13 benchmark and ODinW-O, a newly introduced benchmark designed to assess
class reappearance. For more details, visit our project page:
https://aimagelab.github.io/DitHub/
Authors (6)
Chiara Cappellino
Gianluca Mancusi
Matteo Mosconi
Angelo Porrello
Simone Calderara
Rita Cucchiara
Key Contributions
Introduces DitHub, a modular framework for building and maintaining adaptation modules for open-vocabulary object detection, inspired by version control systems. This modular approach enables exploration of compositional properties of adaptation modules and achieves state-of-the-art performance on new benchmarks.
Business Value
Enables more flexible and efficient adaptation of object detection models to specific industry needs or new product categories without retraining entire models, leading to faster deployment and reduced costs.