Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Research Paper Machine Learning Engineers,AI Researchers,Developers working with LLMs,Computational Linguists 1 week ago

TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

large-language-models › model-architecture
📄 Abstract

Abstract: In this paper we introduce Tale, Task-Aware Layer Elimination, an inference-time algorithm that prunes entire transformer layers in an LLM by directly optimizing task-specific validation performance. We evaluate TALE on 9 tasks and 5 models, including LLaMA 3.1 8B, Qwen 2.5 7B, Qwen 2.5 0.5B, Mistral 7B, and Lucie 7B, under both zero-shot and few-shot settings. Unlike prior approaches, TALE requires no retraining and consistently improves accuracy while reducing computational cost across all benchmarks. Furthermore, applying TALE during finetuning leads to additional performance gains. Finally, TALE provides flexible user control over trade-offs between accuracy and efficiency. Mutual information analysis shows that certain layers act as bottlenecks, degrading task-relevant representations. Tale's selective layer removal remedies this problem, producing smaller, faster, and more accurate models that are also faster to fine-tune while offering new insights into transformer interpretability.
Authors (3)
Omar Naim
Krish Sharma
Nicholas Asher
Submitted
October 26, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

TALE (Task-Aware Layer Elimination) is an inference-time algorithm that prunes entire transformer layers in LLMs by optimizing task-specific validation performance. It requires no retraining, consistently improves accuracy while reducing computational cost, and even enhances fine-tuning efficiency, offering flexible trade-offs between accuracy and efficiency.

Business Value

Significantly reduces the operational costs and latency of deploying LLMs, making them more accessible and practical for a wider range of real-time applications.