Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 80% Match Research Paper ML Researchers,Computer Vision Researchers,AI Theorists 2 weeks ago

Attention (as Discrete-Time Markov) Chains

computer-vision β€Ί scene-understanding
πŸ“„ Abstract

Abstract: We introduce a new interpretation of the attention matrix as a discrete-time Markov chain. Our interpretation sheds light on common operations involving attention scores such as selection, summation, and averaging in a unified framework. It further extends them by considering indirect attention, propagated through the Markov chain, as opposed to previous studies that only model immediate effects. Our key observation is that tokens linked to semantically similar regions form metastable states, i.e., regions where attention tends to concentrate, while noisy attention scores dissipate. Metastable states and their prevalence can be easily computed through simple matrix multiplication and eigenanalysis, respectively. Using these lightweight tools, we demonstrate state-of-the-art zero-shot segmentation. Lastly, we define TokenRank -- the steady state vector of the Markov chain, which measures global token importance. We show that TokenRank enhances unconditional image generation, improving both quality (IS) and diversity (FID), and can also be incorporated into existing segmentation techniques to improve their performance over existing benchmarks. We believe our framework offers a fresh view of how tokens are being attended in modern visual transformers.
Authors (6)
Yotam Erel
Olaf DΓΌnkel
Rishabh Dabral
Vladislav Golyanik
Christian Theobalt
Amit H. Bermano
Submitted
July 23, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Reinterprets the attention matrix as a discrete-time Markov chain, unifying attention operations and enabling the modeling of indirect attention. It identifies 'metastable states' for concentrated attention and introduces 'TokenRank' to measure global token importance, achieving state-of-the-art zero-shot segmentation and enhancing image generation.

Business Value

Provides a deeper theoretical understanding of attention, potentially leading to more interpretable and efficient AI models for tasks like image analysis and generation.