Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 85% Match Research Paper Machine Learning Researchers,Deep Learning Engineers,AI Practitioners 2 days ago

PROFIT: A Specialized Optimizer for Deep Fine Tuning

generative-ai › diffusion
📄 Abstract

Abstract: The fine-tuning of pre-trained models has become ubiquitous in generative AI, computer vision, and robotics. Although much attention has been paid to improving the efficiency of fine-tuning model, there has been less scholarship around fine-tuning specifically for improved model performance. To remedy this gap, we present PROFIT, one of the first optimizers designed to incrementally fine-tune converged models on new tasks and/or datasets. Unlike traditional optimizers such as SGD or Adam, which make minimal assumptions due to random initializations, PROFIT takes the properties of a converged model into account explicitly to regularize the optimization process. Employing a temporal gradient-orthogonalization process, PROFIT outperforms fine-tuning methods in various tasks, from image classification to multimodal language model training to large-scale motion prediction. Moreover, PROFIT is encapsulated as a modular optimizer, which makes it easy to integrate directly into any training pipeline with minimal engineering effort.
Authors (7)
Anirudh S Chakravarthy
Shuai Kyle Zheng
Xin Huang
Sachithra Hemachandra
Xiao Zhang
Yuning Chai
+1 more
Submitted
December 2, 2024
arXiv Category
cs.CV
arXiv PDF

Key Contributions

PROFIT is a novel optimizer specifically designed for incremental fine-tuning of converged models. It addresses the gap in research focusing on performance improvement during fine-tuning, unlike traditional optimizers, by explicitly considering the properties of a converged model and employing a temporal gradient-orthogonalization process.

Business Value

Enables more efficient and effective deployment of pre-trained models for specific downstream tasks, leading to better performance in applications like image recognition, language generation, and robotics control.