Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Fine-tuning is a crucial paradigm for adapting pre-trained large language
models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA)
have been shown to effectively fine-tune LLMs with an extreme reduction in
trainable parameters. But, \emph{are their learned solutions really
equivalent?} We study how LoRA and full-finetuning change pre-trained models by
analyzing the model's weight matrices through the lens of their spectral
properties. We find that LoRA and full fine-tuning yield weight matrices whose
singular value decompositions exhibit very different structure: weight matrices
trained with LoRA have new, high-ranking singular vectors, which we call
\emph{intruder dimensions}, while those trained with full fine-tuning do not.
Further, we extend the finding that LoRA forgets less than full fine-tuning and
find its forgetting is vastly localized to the intruder dimension -- by
causally intervening on the intruder dimensions by changing their associated
singular values post-fine-tuning, we show that they cause forgetting. Moreover,
scaling them down significantly improves modeling of the pre-training
distribution with a minimal drop in downstream task performance. Given this, we
should expect accumulating intruder dimensions to be harmful and lead to more
forgetting. This will be amplified during continual learning because of
sequentially fine-tuning, and we show that LoRA models do accumulate intruder
dimensions here tend to perform worse in this setting, emphasizing the
practicality of our findings.
Authors (4)
Reece Shuttleworth
Jacob Andreas
Antonio Torralba
Pratyusha Sharma
Submitted
October 28, 2024
Key Contributions
This paper reveals that LoRA and full fine-tuning yield fundamentally different learned solutions, challenging the illusion of equivalence. Through spectral analysis of weight matrices, it identifies 'intruder dimensions' unique to LoRA, showing that LoRA's forgetting is localized to these dimensions, offering insights into the distinct mechanisms of parameter-efficient vs. full fine-tuning.
Business Value
Helps organizations choose the most effective and efficient fine-tuning strategy for their LLMs, optimizing resource usage and performance for specific downstream tasks.