Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper LLM Developers,AI Safety Researchers,AI Ethicists,Machine Learning Engineers,Organizations deploying LLMs 1 day ago

Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs

large-language-models › alignment
📄 Abstract

Abstract: Organizations are increasingly adopting and adapting Large Language Models (LLMs) hosted on public repositories such as HuggingFace. Although these adaptations often improve performance on specialized downstream tasks, recent evidence indicates that they can also degrade a model's safety or fairness. Since different fine-tuning techniques may exert distinct effects on these critical dimensions, this study undertakes a systematic assessment of their trade-offs. Four widely used Parameter-Efficient Fine-Tuning methods, LoRA, IA3, Prompt-Tuning, and P-Tuning, are applied to four instruction-tuned model families (Meta-Llama-3-8B, Qwen2.5-7B, Mistral-7B, and Gemma-7B). In total, 235 fine-tuned variants are evaluated across eleven safety hazard categories and nine demographic fairness dimensions. The results show that adapter-based approaches (LoRA, IA3) tend to improve safety scores and are the least disruptive to fairness, retaining higher accuracy and lower bias scores. In contrast, prompt-based methods (Prompt-Tuning and P-Tuning) generally reduce safety and cause larger fairness regressions, with decreased accuracy and increased bias. Alignment shifts are strongly moderated by base model type: LLaMA remains stable, Qwen records modest gains, Gemma experiences the steepest safety decline, and Mistral, which is released without an internal moderation layer, displays the greatest variance. Improvements in safety do not necessarily translate into improvements in fairness, and no single configuration optimizes all fairness metrics simultaneously, indicating an inherent trade-off between these objectives. These findings suggest a practical guideline for safety-critical deployments: begin with a well-aligned base model, favour adapter-based PEFT, and conduct category-specific audits of both safety and fairness.
Authors (5)
Mina Taraghi
Yann Pequignot
Amin Nikanjam
Mohamed Amine Merzouk
Foutse Khomh
Submitted
November 1, 2025
arXiv Category
cs.AI
arXiv PDF

Key Contributions

This study systematically evaluates the trade-offs between efficiency and alignment (safety/fairness) for four popular Parameter-Efficient Fine-Tuning (PEFT) methods applied to four LLM families. It finds that adapter-based methods (LoRA, IA3) are superior in maintaining safety and fairness while retaining accuracy, offering guidance for responsible LLM adaptation.

Business Value

Provides crucial insights for organizations using or developing LLMs, enabling them to choose fine-tuning methods that balance performance gains with essential safety and fairness requirements, thus reducing risks and building more trustworthy AI applications.