Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: The softmax function is a basic operator in machine learning and
optimization, used in classification, attention mechanisms, reinforcement
learning, game theory, and problems involving log-sum-exp terms. Existing
robustness guarantees of learning models and convergence analysis of
optimization algorithms typically consider the softmax operator to have a
Lipschitz constant of $1$ with respect to the $\ell_2$ norm. In this work, we
prove that the softmax function is contractive with the Lipschitz constant
$1/2$, uniformly across all $\ell_p$ norms with $p \ge 1$. We also show that
the local Lipschitz constant of softmax attains $1/2$ for $p = 1$ and $p =
\infty$, and for $p \in (1,\infty)$, the constant remains strictly below $1/2$
and the supremum $1/2$ is achieved only in the limit. To our knowledge, this is
the first comprehensive norm-uniform analysis of softmax Lipschitz continuity.
We demonstrate how the sharper constant directly improves a range of existing
theoretical results on robustness and convergence. We further validate the
sharpness of the $1/2$ Lipschitz constant of the softmax operator through
empirical studies on attention-based architectures (ViT, GPT-2, Qwen3-8B) and
on stochastic policies in reinforcement learning.
Submitted
October 27, 2025
Key Contributions
Proves that the softmax function is contractive with a uniform Lipschitz constant of 1/2 across all $\ell_p$ norms. This is a tighter bound than previously assumed and has implications for robustness and convergence analysis.
Business Value
Underpins the development of more stable and predictable machine learning models and optimization algorithms, leading to more reliable AI systems.