Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Survey Paper AI Researchers,Machine Learning Engineers,Deep Learning Practitioners 4 weeks ago

The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures

large-language-models › model-architecture
📄 Abstract

Abstract: Transformers have dominated sequence processing tasks for the past seven years -- most notably language modeling. However, the inherent quadratic complexity of their attention mechanism remains a significant bottleneck as context length increases. This paper surveys recent efforts to overcome this bottleneck, including advances in (sub-quadratic) attention variants, recurrent neural networks, state space models, and hybrid architectures. We critically analyze these approaches in terms of compute and memory complexity, benchmark results, and fundamental limitations to assess whether the dominance of pure-attention transformers may soon be challenged.

Key Contributions

This paper surveys and critically analyzes recent efforts to overcome the quadratic complexity bottleneck of Transformer attention mechanisms. It examines alternatives like sub-quadratic attention variants, RNNs, SSMs, and hybrid architectures, assessing their compute/memory complexity and benchmark performance to determine if pure-attention Transformers will be challenged.

Business Value

Identifies more efficient and scalable model architectures, enabling the development of LLMs that can handle longer contexts and require less computational resources, thus reducing operational costs.