arxiv_cl 95% Match Research Paper ML Engineers,AI Researchers,Developers of efficient AI models 1 week ago

1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models

large-language-models › training-methods

📄 Abstract

Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency in language comprehension and generation; however, their widespread adoption is constrained by substantial bandwidth and computational demands. While pruning and low-rank approximation have each demonstrated promising performance individually, their synergy for LLMs remains underexplored. We introduce \underline{S}ynergistic \underline{S}parse and \underline{L}ow-Rank \underline{C}ompression (SSLC) methods for LLMs, which leverages the strengths of both techniques: low-rank approximation compresses the model by retaining its essential structure with minimal information loss, whereas sparse optimization eliminates non-essential weights, preserving those crucial for generalization. Based on theoretical analysis, we first formulate the low-rank approximation and sparse optimization as a unified problem and solve it by iterative optimization algorithm. Experiments on LLaMA and Qwen2.5 models (7B-70B) show that SSLC, without any additional training steps, consistently surpasses standalone methods, achieving state-of-the-arts results. Notably, SSLC compresses Qwen2.5 by 50\% with no performance drop and achieves at least 1.63$\times$ speedup, offering a practical solution for efficient LLM deployment.

Authors (7)

Zeliang Zong

Kai Zhang

Zheyang Li

Wenming Tan

Ye Ren

Yiyan Zhai

+1 more

Submitted

October 30, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

This paper introduces Synergistic Sparse and Low-Rank Compression (SSLC) methods for LLMs, which combine low-rank approximation and sparse optimization into a unified problem solved iteratively. This approach aims to significantly reduce model size and computational demands while retaining performance.

Business Value

Enables the deployment of LLMs on resource-constrained devices and reduces operational costs for large-scale AI deployments.

Paper Metadata

Innovation Type

Combined Compression Technique

Deployment Feasibility

High, as it directly addresses deployment constraints.

Limitations Addressed

Substantial bandwidth and computational demands of LLMs, and the underexplored synergy between pruning and low-rank approximation.

Performance Gains

Experiments show SSLC methods applied to LLaMA and Qwen2.5 models (7B-70B) achieve significant compression.

Technical Tags

model compressionLLM optimizationsparse compressionlow-rank approximationSSLCLLaMAQwen2.5iterative optimization

Research Topics

Efficient LLMsModel Compression TechniquesLLM OptimizationParameter-Efficient Fine-Tuning

Methods & Architectures

Synergistic Sparse and Low-Rank Compression (SSLC)iterative optimization algorithmlow-rank approximationsparse optimization Large Language Models (LLMs)LLaMAQwen2.5

Applications & Tasks

Natural Language Processing Machine Learning Optimization High bandwidth and computational demands of LLMsLimited synergy between pruning and low-rank approximation Compressing LLMsReducing model size while preserving performanceUnified sparse and low-rank optimization

Related Fields

Machine LearningDeep LearningModel OptimizationComputer Architecture

Keywords

LLMcompressionsparselow-rankoptimizationLLaMAQwenAIdeep learningmodel size

Academic Context

#Efficient LLMs#Model Compression Techniques#LLM Optimization#Parameter-Efficient Fine-Tuning

Commercial Potential

Potential Products

Optimized LLM librariesOn-device AI solutions

Target Industries

TechnologyMobileEdge Computing

Use Case Examples

Running LLMs on smartphonesReducing cloud inference costs for AI services

Competitive Edge

Offers a synergistic approach to compression, potentially outperforming methods that use sparse or low-rank techniques individually.

Market Opportunity

Large and growing market for efficient AI models.

Revenue Models

Licensing of compression techniquesoffering optimized models.

Resource Requirements

Compute Needs

High (for training/compression process), Low (for inference of compressed models)

Data Requirements

Large text corpora for training/fine-tuning LLMs.

Scalability

Focuses on creating smaller, more scalable models.

Production Readiness

Maturity Level

Research

Time to Market

Medium (requires implementation and integration)

View Full Paper Back to Papers