Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Post-training has demonstrated its importance in enhancing the reasoning
capabilities of large language models (LLMs). The primary post-training methods
can be categorized into supervised fine-tuning (SFT) and reinforcement
fine-tuning (RFT). SFT is efficient and well-suited for small language models,
but it may lead to overfitting and limit the reasoning abilities of larger
models. In contrast, RFT generally yields better generalization but depends
heavily on the strength of the base model. To address the limitations of SFT
and RFT, we propose Unified Fine-Tuning (UFT), a novel post-training paradigm
that unifies SFT and RFT into a single, integrated process. UFT enables the
model to effectively explore solutions while incorporating informative
supervision signals, bridging the gap between memorizing and thinking
underlying existing methods. Notably, UFT outperforms both SFT and RFT in
general, regardless of model sizes. Furthermore, we theoretically prove that
UFT breaks RFT's inherent exponential sample complexity bottleneck, showing for
the first time that unified training can exponentially accelerate convergence
on long-horizon reasoning tasks.
Authors (3)
Mingyang Liu
Gabriele Farina
Asuman Ozdaglar
Key Contributions
Introduces Unified Fine-Tuning (UFT), a novel post-training paradigm that integrates Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT) into a single process. UFT effectively balances exploration and informative supervision, outperforming both SFT and RFT across model sizes and theoretically proven to break RFT limitations.
Business Value
Enables the development of more capable and versatile LLMs, leading to improved performance in downstream applications like content generation, summarization, and complex question answering.