arxiv_ml 90% Match Research Paper Hardware Designers,VLSI Engineers,AI Researchers,Embedded Systems Developers 3 weeks ago

Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow

computer-vision › model-architecture

📄 Abstract

Abstract: Current transformer accelerators primarily focus on optimizing self-attention due to its quadratic complexity. However, this focus is less relevant for vision transformers with short token lengths, where the Feed-Forward Network (FFN) tends to be the dominant computational bottleneck. This paper presents a low power Vision Transformer accelerator, optimized through algorithm-hardware co-design. The model complexity is reduced using hardware-friendly dynamic token pruning without introducing complex mechanisms. Sparsity is further improved by replacing GELU with ReLU activations and employing dynamic FFN2 pruning, achieving a 61.5\% reduction in operations and a 59.3\% reduction in FFN2 weights, with an accuracy loss of less than 2\%. The hardware adopts a row-wise dataflow with output-oriented data access to eliminate data transposition, and supports dynamic operations with minimal area overhead. Implemented in TSMC's 28nm CMOS technology, our design occupies 496.4K gates and includes a 232KB SRAM buffer, achieving a peak throughput of 1024 GOPS at 1GHz, with an energy efficiency of 2.31 TOPS/W and an area efficiency of 858.61 GOPS/mm2.

Authors (2)

Ching-Lin Hsiung

Tian-Sheuan Chang

Submitted

October 16, 2025

arXiv Category

cs.AR

arXiv PDF

Key Contributions

This paper presents a novel low-power Vision Transformer accelerator optimized through algorithm-hardware co-design. It introduces hardware-friendly dynamic token pruning and sparsity improvements (replacing GELU with ReLU, dynamic FFN2 pruning) to significantly reduce operations and weights with minimal accuracy loss. The hardware design features a row-wise dataflow for efficient data access.

Business Value

Enables the deployment of powerful Vision Transformer models on power-constrained devices like mobile phones and edge sensors, opening up new possibilities for on-device AI and real-time vision processing.

Paper Metadata

Innovation Type

Hardware-Software Co-design

Deployment Feasibility

High for embedded systems and edge devices, as it's implemented in standard CMOS technology and targets low power.

Limitations Addressed

Addresses the computational bottleneck and high power consumption of Vision Transformers, particularly in the Feed-Forward Network (FFN) layers, which are often dominant for short token lengths.

Performance Gains

61.5% reduction in operations,59.3% reduction in FFN2 weights,< 2% accuracy loss

Technical Tags

Vision TransformerHardware-aware pruningLow power designAcceleratorFFN optimizationDynamic token pruningReLU activationHardware-software co-designCMOS technologyVLSI

Research Topics

Efficient Deep LearningHardware AccelerationComputer Vision HardwareTransformer ArchitecturesLow-Power Computing

Methods & Architectures

Hardware-aware pruningDynamic token pruningAlgorithm-hardware co-designSparsity optimizationRow-wise dataflow Vision Transformer (ViT)Feed-Forward Network (FFN)

Applications & Tasks

Edge Computing Embedded Systems Mobile Devices Computer Vision Computational BottleneckPower ConsumptionHardware Efficiency Accelerating Vision TransformersReducing power consumption in vision hardwareOptimizing FFN computation

Related Fields

Hardware AccelerationVLSI DesignEmbedded SystemsDeep Learning Optimization

Keywords

Vision Transformeracceleratorlow powerhardware-aware pruningFFNdynamic token pruningReLUalgorithm-hardware co-designedge AIembedded systemsCMOSVLSI

Academic Context

#Efficient Deep Learning#Hardware Acceleration#Computer Vision Hardware#Transformer Architectures#Low-Power Computing

Technology Stack

Frameworks & Libraries

Vision Transformer

Commercial Potential

Potential Products

Custom AI chips for edge devicesOptimized vision processing units

Target Industries

Consumer ElectronicsAutomotiveIoTRobotics

Use Case Examples

Real-time object recognition on smartphonesLow-power visual surveillance systemsAI-powered features in wearable devices

Competitive Edge

Offers a specialized hardware solution for Vision Transformers that prioritizes low power and efficiency, particularly targeting the FFN bottleneck, which may differentiate it from general-purpose accelerators.

Market Opportunity

Large, driven by the growing demand for AI on edge devices.

Revenue Models

Chip salesIP licensing.

Resource Requirements

Compute Needs

Specific hardware implementation (TSMC 28nm CMOS).

Data Requirements

Standard image datasets for training and evaluating Vision Transformers.

Deployment Constraints

Requires custom silicon fabrication; integration into existing systems might be complex.

Scalability

The design is implemented as a fixed-function accelerator, scalability would depend on system-level integration and potential for parallelization.

Production Readiness

Maturity Level

Hardware prototype/design

Time to Market

Medium to Long, due to silicon fabrication cycles.

Patent Potential

High, for the novel pruning techniques and hardware architecture.

View Full Paper Back to Papers