arxiv_ml 70% Match Research Paper High-energy physicists,Hardware engineers,ML researchers working on efficient models 1 week ago

Sub-microsecond Transformers for Jet Tagging on FPGAs

large-language-models › model-architecture

📄 Abstract

Abstract: We present the first sub-microsecond transformer implementation on an FPGA achieving competitive performance for state-of-the-art high-energy physics benchmarks. Transformers have shown exceptional performance on multiple tasks in modern machine learning applications, including jet tagging at the CERN Large Hadron Collider (LHC). However, their computational complexity prohibits use in real-time applications, such as the hardware trigger system of the collider experiments up until now. In this work, we demonstrate the first application of transformers for jet tagging on FPGAs, achieving $\mathcal{O}(100)$ nanosecond latency with superior performance compared to alternative baseline models. We leverage high-granularity quantization and distributed arithmetic optimization to fit the entire transformer model on a single FPGA, achieving the required throughput and latency. Furthermore, we add multi-head attention and linear attention support to hls4ml, making our work accessible to the broader fast machine learning community. This work advances the next-generation trigger systems for the High Luminosity LHC, enabling the use of transformers for real-time applications in high-energy physics and beyond.

Authors (10)

Lauri Laatu

Chang Sun

Arianna Cox

Abhijith Gandrakota

Benedikt Maier

Jennifer Ngadiuba

+4 more

Submitted

October 26, 2025

arXiv Category

physics.ins-det

arXiv PDF

Key Contributions

This paper presents the first sub-microsecond transformer implementation on an FPGA for jet tagging in high-energy physics. It achieves competitive performance and superior latency compared to baseline models by leveraging quantization and distributed arithmetic optimization, making real-time applications of transformers feasible.

Business Value

Enables real-time data analysis and decision-making in high-stakes scientific experiments, potentially leading to faster discoveries and more efficient data acquisition.

Paper Metadata

Innovation Type

Algorithmic Optimization and Hardware Implementation

Deployment Feasibility

High, as it demonstrates successful deployment on FPGAs, a common hardware for real-time processing in scientific instruments.

Limitations Addressed

Computational complexity of transformers prohibiting real-time applications and deployment on hardware triggers.

Performance Gains

Achieves O(100) nanosecond latency with superior performance compared to alternative baseline models.

Technical Tags

FPGATransformerJet TaggingHigh-Energy PhysicsReal-time SystemsQuantizationDistributed ArithmeticHardware AccelerationDeep LearningMachine Learning

Research Topics

Hardware Acceleration for Deep LearningReal-time Data Processing in PhysicsTransformer Model OptimizationEdge Computing for Scientific ApplicationsHigh-Energy Physics Data Analysis

Methods & Architectures

TransformerHigh-granularity QuantizationDistributed Arithmetic OptimizationMulti-head AttentionLinear Attention Transformer

Applications & Tasks

High-Energy Physics Particle Physics Collider Experiments Real-time Event IdentificationComputational EfficiencyModel Deployment on Hardware Jet Tagging

Related Fields

Computer EngineeringEmbedded SystemsMachine Learning EngineeringScientific Computing

Keywords

FPGATransformerJet TaggingHigh-Energy PhysicsReal-timeLatencyQuantizationHardware AccelerationDeep LearningMachine LearningCERNLHCTrigger SystemDistributed Arithmetic

Academic Context

CERN #Hardware Acceleration for Deep Learning#Real-time Data Processing in Physics#Transformer Model Optimization#Edge Computing for Scientific Applications#High-Energy Physics Data Analysis

Companies & Organizations

Research Institutions

CERN

Technology Stack

Frameworks & Libraries

hls4ml

Commercial Potential

Potential Products

Real-time data acquisition systems for particle acceleratorsEdge AI hardware for scientific instruments

Target Industries

Scientific ResearchParticle Physics

Use Case Examples

Real-time identification of particle jets in collider experiments

Competitive Edge

Offers a novel hardware-accelerated approach for transformers, overcoming latency limitations of traditional software implementations for real-time applications.

Resource Requirements

Compute Needs

FPGA hardware

Data Requirements

High-energy physics datasets for jet tagging

Deployment Constraints

Hardware resource limitations on FPGAs, model complexity.

Scalability

The approach fits the entire transformer model on a single FPGA, suggesting good scalability for specific hardware configurations.

Production Readiness

Maturity Level

Research/Prototype

Time to Market

Medium to Long (requires hardware integration and validation)

View Full Paper Back to Papers