arxiv_ml 85% Match Research Paper Researchers in game theory and online learning,AI researchers working on multi-agent systems,Economists and operations researchers 19 hours ago

Two-Player Zero-Sum Games with Bandit Feedback

reinforcement-learning › game-playing

📄 Abstract

Abstract: We study a two-player zero-sum game in which the row player aims to maximize their payoff against an adversarial column player, under an unknown payoff matrix estimated through bandit feedback. We propose three algorithms based on the Explore-Then-Commit framework. The first adapts it to zero-sum games, the second incorporates adaptive elimination that leverages the $\varepsilon$-Nash Equilibrium property to efficiently select the optimal action pair, and the third extends the elimination algorithm by employing non-uniform exploration. Our objective is to demonstrate the applicability of ETC in a zero-sum game setting by focusing on learning pure strategy Nash Equilibria. A key contribution of our work is a derivation of instance-dependent upper bounds on the expected regret of our proposed algorithms, which has received limited attention in the literature on zero-sum games. Particularly, after $T$ rounds, we achieve an instance-dependent regret upper bounds of $O(\Delta + \sqrt{T})$ for ETC in zero-sum game setting and $O(\log (T \Delta^2) / \Delta)$ for the adaptive elimination algorithm and its variant with non-uniform exploration, where $\Delta$ denotes the suboptimality gap. Therefore, our results indicate that ETC-based algorithms perform effectively in adversarial game settings, achieving regret bounds comparable to existing methods while providing insight through instance-dependent analysis.

Key Contributions

This paper studies two-player zero-sum games with bandit feedback and proposes three algorithms based on the Explore-Then-Commit (ETC) framework. A key contribution is the derivation of instance-dependent upper bounds on the expected regret for these algorithms, which is novel for zero-sum games. The work demonstrates the applicability of ETC in this setting, focusing on learning pure strategy Nash Equilibria.

Business Value

Improved algorithms for strategic decision-making under uncertainty can be applied in competitive markets, auction design, and resource allocation where agents have conflicting interests.

Paper Metadata

Innovation Type

Algorithmic Improvement

Deployment Feasibility

Moderate. Theoretical algorithms that require careful implementation and potentially large numbers of interactions to achieve performance guarantees.

Limitations Addressed

Limited attention to instance-dependent regret bounds in zero-sum games, applying ETC framework to adversarial game settings.

Performance Gains

Achieves instance-dependent regret upper bounds of $O(\Delta + \s)$ after T rounds.

Technical Tags

two-player zero-sum gamesbandit feedbackregret minimizationNash equilibriumExplore-Then-Commit (ETC)adaptive eliminationnon-uniform explorationpure strategyinstance-dependent boundsadversarial learning

Research Topics

Game TheoryOnline LearningReinforcement LearningAlgorithmic Game TheoryDecision Theory

Methods & Architectures

Explore-Then-Commit (ETC) frameworkAdaptive eliminationNon-uniform exploration

Applications & Tasks

Algorithmic Game Theory Online Decision Making Economics Operations Research Learning in adversarial settingsRegret minimizationFinding Nash equilibriaEstimating unknown payoff matrices Playing two-player zero-sum gamesStrategic decision making under uncertainty

Related Fields

Machine LearningGame TheoryOptimizationEconomics

Keywords

zero-sum gamesbandit feedbackregretNash equilibriumonline learningadversarialgame theoryExplore-Then-Commitinstance-dependentpure strategyalgorithmic game theory

Academic Context

#Game Theory#Online Learning#Reinforcement Learning#Algorithmic Game Theory#Decision Theory

Commercial Potential

Potential Products

Automated trading systemsDynamic pricing algorithmsResource allocation optimization tools

Target Industries

FinanceE-commerceGamingTelecommunications

Use Case Examples

Designing optimal bidding strategies in auctionsDeveloping competitive strategies in online gamesAllocating resources between competing entities

Competitive Edge

Provides theoretical guarantees and novel algorithms for a challenging class of online learning problems in adversarial settings.

Market Opportunity

Significant market for AI in competitive strategy and optimization.

Revenue Models

Consulting servicesspecialized software development.

Resource Requirements

Compute Needs

Low to moderate (for simulation/testing)

Data Requirements

Payoff matrices (potentially unknown and learned)

Deployment Constraints

Requires interaction with the environment; performance depends on the number of rounds.

Scalability

Scalability depends on the complexity of the game and the number of actions.

Production Readiness

Maturity Level

Theoretical/Research

Time to Market

3-7 years (for practical application in complex systems)

Patent Potential

Low (theoretical contribution)

View Full Paper Back to Papers