Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
π Abstract
Abstract: We introduce a new interpretation of the attention matrix as a discrete-time
Markov chain. Our interpretation sheds light on common operations involving
attention scores such as selection, summation, and averaging in a unified
framework. It further extends them by considering indirect attention,
propagated through the Markov chain, as opposed to previous studies that only
model immediate effects. Our key observation is that tokens linked to
semantically similar regions form metastable states, i.e., regions where
attention tends to concentrate, while noisy attention scores dissipate.
Metastable states and their prevalence can be easily computed through simple
matrix multiplication and eigenanalysis, respectively. Using these lightweight
tools, we demonstrate state-of-the-art zero-shot segmentation. Lastly, we
define TokenRank -- the steady state vector of the Markov chain, which measures
global token importance. We show that TokenRank enhances unconditional image
generation, improving both quality (IS) and diversity (FID), and can also be
incorporated into existing segmentation techniques to improve their performance
over existing benchmarks. We believe our framework offers a fresh view of how
tokens are being attended in modern visual transformers.
Authors (6)
Yotam Erel
Olaf DΓΌnkel
Rishabh Dabral
Vladislav Golyanik
Christian Theobalt
Amit H. Bermano
Key Contributions
Reinterprets the attention matrix as a discrete-time Markov chain, unifying attention operations and enabling the modeling of indirect attention. It identifies 'metastable states' for concentrated attention and introduces 'TokenRank' to measure global token importance, achieving state-of-the-art zero-shot segmentation and enhancing image generation.
Business Value
Provides a deeper theoretical understanding of attention, potentially leading to more interpretable and efficient AI models for tasks like image analysis and generation.