arxiv_ml 75% Match Research Paper Deep learning researchers,ML engineers,Researchers focused on efficient AI,Computer vision and NLP practitioners 3 weeks ago

Efficient Dynamic Structured Sparse Training with Learned Shuffles

generative-ai › autoregressive

📄 Abstract

Abstract: Structured sparsity accelerates training and inference on modern GPUs, yet it still trails unstructured dynamic sparse training (DST) in accuracy. The shortfall stems from a loss of expressivity: whereas a dense layer can realize every possible mask obtained by choosing any $w$ active weights out of $n$, a fixed block or N:M layout explores only a subset of those possibilities. We propose to close this gap by learning, for each layer, a single permutation matrix jointly with the structured weight matrix. Applied to three canonical structures -- block, N:M, and diagonals -- we show that permutation-augmented DST (PA-DST) matches unstructured baselines (RigL, SET) at 90--95\% sparsity on ImageNet-1K (ViT-B/16) and WikiText-103 (GPT-2), yet trains up to $1.21\times$ and infers up to $2.9\times$ faster. The results position structure + learned permutation as a sweet spot between accuracy and efficiency.

Authors (6)

Abhishek Tyagi

Arjun Iyer

Liam Young

William H Renninger

Christopher Kanan

Yuhao Zhu

Submitted

October 16, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

This paper proposes Permutation-Augmented Dynamic Sparse Training (PA-DST), which learns permutation matrices alongside structured weight matrices for each layer. This approach closes the expressivity gap between structured and unstructured sparsity, enabling structured sparse models to match the accuracy of unstructured ones while offering significant speedups in training and inference.

Business Value

Enables deployment of larger, more accurate models on resource-constrained devices by significantly improving training and inference efficiency without sacrificing accuracy.

Paper Metadata

Innovation Type

Algorithmic/Methodological

Deployment Feasibility

High, as it integrates with existing sparse training techniques and targets common GPU architectures.

Limitations Addressed

Addresses the loss of expressivity in structured sparsity methods, which limits their accuracy compared to unstructured sparsity, while still aiming for the efficiency benefits of structured sparsity.

Performance Gains

Matches unstructured baselines (RigL, SET) at 90-95% sparsity, trains up to 1.21x faster, and infers up to 2.9x faster.

Technical Tags

Structured sparsityDynamic sparse training (DST)Permutation matricesN:M sparsityBlock sparsityDiagonal sparsityGPU accelerationImageNet-1KViT-B/16GPT-2RigLSET

Research Topics

Model CompressionEfficient Deep LearningNeural Network TrainingComputer VisionNatural Language Processing

Methods & Architectures

Permutation-augmented Dynamic Sparse Training (PA-DST)Learning permutation matricesJoint training of weights and permutations Vision Transformer (ViT)GPT-2

Applications & Tasks

Computer Vision Natural Language Processing Edge AI Mobile Computing Bridging accuracy gap between structured and unstructured sparsityImproving training and inference speedAchieving high sparsity levels Training sparse neural networksAccelerating inferenceReducing model size

Datasets & Benchmarks

Datasets

ImageNet-1K, WikiText-103

Benchmarks

ViT-B/16 on ImageNet-1K • GPT-2 on WikiText-103

AccuracySparsity levelTraining speedInference speed

Related Fields

Model CompressionHardware AccelerationDeep LearningComputer VisionNLP

Keywords

structured sparsitydynamic sparsitypermutation matrixN:Mblock sparsityGPUViTGPT-2model compressionefficient deep learningRigLSET

Academic Context

#Model Compression#Efficient Deep Learning#Neural Network Training#Computer Vision#Natural Language Processing

Commercial Potential

Potential Products

Optimized deep learning librariesModel compression toolkits

Target Industries

TechnologyMobileAutomotiveAI Development

Use Case Examples

Deploying large vision models on mobile devicesAccelerating NLP models for real-time applicationsReducing computational cost for AI inference

Competitive Edge

Achieves a superior balance between accuracy and efficiency for sparse models by combining structured sparsity with learned permutations, outperforming both purely structured and unstructured sparse methods.

Market Opportunity

Large and growing market for efficient deep learning solutions.

Revenue Models

Licensing of technologyintegration into AI platforms.

Resource Requirements

Compute Needs

Moderate to high, requires GPUs for training.

Data Requirements

Large-scale datasets like ImageNet and WikiText.

Deployment Constraints

Requires hardware/software support for sparse matrix operations.

Scalability

Improves scalability by reducing computational and memory requirements during training and inference.

Production Readiness

Maturity Level

Research

Time to Market

1-3 years

Patent Potential

Moderate, for the PA-DST method and learned permutation techniques.

View Full Paper Back to Papers