📄 Abstract
Abstract: Organizations are increasingly adopting and adapting Large Language Models
(LLMs) hosted on public repositories such as HuggingFace. Although these
adaptations often improve performance on specialized downstream tasks, recent
evidence indicates that they can also degrade a model's safety or fairness.
Since different fine-tuning techniques may exert distinct effects on these
critical dimensions, this study undertakes a systematic assessment of their
trade-offs. Four widely used Parameter-Efficient Fine-Tuning methods, LoRA,
IA3, Prompt-Tuning, and P-Tuning, are applied to four instruction-tuned model
families (Meta-Llama-3-8B, Qwen2.5-7B, Mistral-7B, and Gemma-7B). In total, 235
fine-tuned variants are evaluated across eleven safety hazard categories and
nine demographic fairness dimensions. The results show that adapter-based
approaches (LoRA, IA3) tend to improve safety scores and are the least
disruptive to fairness, retaining higher accuracy and lower bias scores. In
contrast, prompt-based methods (Prompt-Tuning and P-Tuning) generally reduce
safety and cause larger fairness regressions, with decreased accuracy and
increased bias. Alignment shifts are strongly moderated by base model type:
LLaMA remains stable, Qwen records modest gains, Gemma experiences the steepest
safety decline, and Mistral, which is released without an internal moderation
layer, displays the greatest variance. Improvements in safety do not
necessarily translate into improvements in fairness, and no single
configuration optimizes all fairness metrics simultaneously, indicating an
inherent trade-off between these objectives. These findings suggest a practical
guideline for safety-critical deployments: begin with a well-aligned base
model, favour adapter-based PEFT, and conduct category-specific audits of both
safety and fairness.
Authors (5)
Mina Taraghi
Yann Pequignot
Amin Nikanjam
Mohamed Amine Merzouk
Foutse Khomh
Submitted
November 1, 2025
Key Contributions
This study systematically evaluates the trade-offs between efficiency and alignment (safety/fairness) for four popular Parameter-Efficient Fine-Tuning (PEFT) methods applied to four LLM families. It finds that adapter-based methods (LoRA, IA3) are superior in maintaining safety and fairness while retaining accuracy, offering guidance for responsible LLM adaptation.
Business Value
Provides crucial insights for organizations using or developing LLMs, enabling them to choose fine-tuning methods that balance performance gains with essential safety and fairness requirements, thus reducing risks and building more trustworthy AI applications.