Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper AI Researchers,ML Engineers,Medical Imaging Specialists,Domain Experts in specialized fields 1 month ago

LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models

large-language-models › multimodal-llms
📄 Abstract

Abstract: Multimodal Large Language Models (MLLMs) have achieved strong performance on general visual benchmarks but struggle with out-of-distribution (OOD) tasks in specialized domains such as medical imaging, where labeled data is limited and expensive. We introduce LEAML, a label-efficient adaptation framework that leverages both scarce labeled VQA samples and abundant unlabeled images. Our approach generates domain-relevant pseudo question-answer pairs for unlabeled data using a QA generator regularized by caption distillation. Importantly, we selectively update only those neurons most relevant to question-answering, enabling the QA Generator to efficiently acquire domain-specific knowledge during distillation. Experiments on gastrointestinal endoscopy and sports VQA demonstrate that LEAML consistently outperforms standard fine-tuning under minimal supervision, highlighting the effectiveness of our proposed LEAML framework.

Key Contributions

Introduces LEAML, a label-efficient adaptation framework for MLLMs facing out-of-distribution tasks with limited labeled data. It leverages scarce labeled VQA samples and abundant unlabeled images by generating pseudo-QA pairs via a QA generator regularized by caption distillation, selectively updating relevant neurons for efficient domain knowledge acquisition.

Business Value

Enables the application of powerful MLLMs to specialized domains with scarce data, such as medical diagnostics or niche industrial applications, reducing the need for extensive data labeling efforts.