Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
Proposes a low-cost visual program distillation method using template-based augmentation to generate specialized visual programs for tasks like VQA. This method requires no human-generated program annotations and enables smaller language models (<=1B parameters) to generate high-quality programs with significantly faster inference.
Reduces the cost and time required to develop specialized AI models for visual tasks, making advanced capabilities like VQA more accessible and efficient for various applications.