Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 90% Match Research Paper AI Researchers,ML Engineers,Computer Vision Practitioners,NLP Engineers,Developers working with LLMs 20 hours ago

Visual Program Distillation with Template-Based Augmentation

large-language-models › model-architecture
📄 Abstract

Abstract: Adapting visual programming or prompting large language models (LLMs) to generate executable code for visual tasks like visual question answering (VQA) for specialized tasks or domains remains challenging due to high annotation and inference costs. We propose a low-cost visual program distillation method that can be used for models with at most 1 billion parameters and requires no human-generated program annotations. We achieve this through synthetic data augmentation based on decoupling programs into higher-level skills, called templates, and their corresponding arguments. Experimental results show that, with a relatively small amount of question/answer data, small language models can generate high-quality specialized visual programs with the added benefit of much faster inference

Key Contributions

Proposes a low-cost visual program distillation method using template-based augmentation to generate specialized visual programs for tasks like VQA. This method requires no human-generated program annotations and enables smaller language models (<=1B parameters) to generate high-quality programs with significantly faster inference.

Business Value

Reduces the cost and time required to develop specialized AI models for visual tasks, making advanced capabilities like VQA more accessible and efficient for various applications.