Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We study a two-player zero-sum game in which the row player aims to maximize
their payoff against an adversarial column player, under an unknown payoff
matrix estimated through bandit feedback. We propose three algorithms based on
the Explore-Then-Commit framework. The first adapts it to zero-sum games, the
second incorporates adaptive elimination that leverages the $\varepsilon$-Nash
Equilibrium property to efficiently select the optimal action pair, and the
third extends the elimination algorithm by employing non-uniform exploration.
Our objective is to demonstrate the applicability of ETC in a zero-sum game
setting by focusing on learning pure strategy Nash Equilibria. A key
contribution of our work is a derivation of instance-dependent upper bounds on
the expected regret of our proposed algorithms, which has received limited
attention in the literature on zero-sum games. Particularly, after $T$ rounds,
we achieve an instance-dependent regret upper bounds of $O(\Delta + \sqrt{T})$
for ETC in zero-sum game setting and $O(\log (T \Delta^2) / \Delta)$ for the
adaptive elimination algorithm and its variant with non-uniform exploration,
where $\Delta$ denotes the suboptimality gap. Therefore, our results indicate
that ETC-based algorithms perform effectively in adversarial game settings,
achieving regret bounds comparable to existing methods while providing insight
through instance-dependent analysis.
Key Contributions
This paper studies two-player zero-sum games with bandit feedback and proposes three algorithms based on the Explore-Then-Commit (ETC) framework. A key contribution is the derivation of instance-dependent upper bounds on the expected regret for these algorithms, which is novel for zero-sum games. The work demonstrates the applicability of ETC in this setting, focusing on learning pure strategy Nash Equilibria.
Business Value
Improved algorithms for strategic decision-making under uncertainty can be applied in competitive markets, auction design, and resource allocation where agents have conflicting interests.