Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: When data are collected adaptively, such as in bandit algorithms, classical
statistical approaches such as ordinary least squares and $M$-estimation will
often fail to achieve asymptotic normality. Although recent lines of work have
modified the classical approaches to ensure valid inference on adaptively
collected data, most of these works assume that the model is correctly
specified. We propose a method that provides valid inference for M-estimators
that use adaptively collected bandit data with a (possibly) misspecified
working model. A key ingredient in our approach is the use of flexible machine
learning approaches to stabilize the variance induced by adaptive data
collection. A major novelty is that our procedure enables the construction of
valid confidence sets even in settings where treatment policies are unstable
and non-converging, such as when there is no unique optimal arm and standard
bandit algorithms are used. Empirical results on semi-synthetic datasets
constructed from the Osteoarthritis Initiative demonstrate that the method
maintains type I error control, while existing methods for inference in
adaptive settings do not cover in the misspecified case.