arxiv_ml 95% Match Research Paper Computer vision researchers,ML engineers,Developers of object detection systems 2 weeks ago

DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection

computer-vision › object-detection

📄 Abstract

Abstract: Open-Vocabulary object detectors can generalize to an unrestricted set of categories through simple textual prompting. However, adapting these models to rare classes or reinforcing their abilities on multiple specialized domains remains essential. While recent methods rely on monolithic adaptation strategies with a single set of weights, we embrace modular deep learning. We introduce DitHub, a framework designed to build and maintain a library of efficient adaptation modules. Inspired by Version Control Systems, DitHub manages expert modules as branches that can be fetched and merged as needed. This modular approach allows us to conduct an in-depth exploration of the compositional properties of adaptation modules, marking the first such study in Object Detection. Our method achieves state-of-the-art performance on the ODinW-13 benchmark and ODinW-O, a newly introduced benchmark designed to assess class reappearance. For more details, visit our project page: https://aimagelab.github.io/DitHub/

Authors (6)

Chiara Cappellino

Gianluca Mancusi

Matteo Mosconi

Angelo Porrello

Simone Calderara

Rita Cucchiara

Submitted

March 12, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Introduces DitHub, a modular framework for building and maintaining adaptation modules for open-vocabulary object detection, inspired by version control systems. This modular approach enables exploration of compositional properties of adaptation modules and achieves state-of-the-art performance on new benchmarks.

Business Value

Enables more flexible and efficient adaptation of object detection models to specific industry needs or new product categories without retraining entire models, leading to faster deployment and reduced costs.

Paper Metadata

Innovation Type

Algorithmic/Framework

Deployment Feasibility

High, as modularity allows for targeted updates and easier integration into existing systems.

Limitations Addressed

Monolithic adaptation strategies that struggle with rare classes or multiple specialized domains.

Technical Tags

open-vocabulary object detectionmodular deep learningadaptation modulesversion control systemscompositional propertiesobject detection benchmarksrare class detectiondomain adaptation

Research Topics

Object DetectionOpen-Vocabulary LearningModular AITransfer LearningDeep Learning Frameworks

Methods & Architectures

Modular adaptationBranching and merging modulesTextual prompting Modular deep learning framework

Applications & Tasks

Computer Vision Object Detection Generalization to unseen classesDomain adaptation for object detectionHandling rare classes Open-vocabulary object detectionAdapting models to specialized domains

Datasets & Benchmarks

Datasets

ODinW-13, ODinW-O

Benchmarks

ODinW-13 (state-of-the-art performance) • ODinW-O (state-of-the-art performance)

Related Fields

Machine LearningDeep LearningComputer VisionNatural Language Processing

Keywords

object detectionopen-vocabularymodular learningadaptationdeep learningframeworkversion controlcompositionalityrare classesdomain adaptationcomputer visionAI

Academic Context

#Object Detection#Open-Vocabulary Learning#Modular AI#Transfer Learning#Deep Learning Frameworks

Technology Stack

Frameworks & Libraries

DitHub

Commercial Potential

Potential Products

Customizable object detection servicesSpecialized visual recognition modules

Target Industries

E-commerceAutonomous DrivingRoboticsSurveillance

Use Case Examples

Detecting niche product categories in retailAdapting autonomous vehicle perception to new road signsIdentifying rare wildlife species in camera trap data

Competitive Edge

Offers a more flexible and efficient alternative to monolithic adaptation methods by leveraging modularity and version control principles.

Market Opportunity

Growing market for specialized AI solutions.

Revenue Models

Licensing of the framework or specialized modulesservice-based adaptation.

Resource Requirements

Scalability

Modular design suggests good scalability by adding/managing modules.

Production Readiness

Maturity Level

Research

Time to Market

1-2 years

View Full Paper Back to Papers