Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We consider a data-driven formulation of the classical discrete-time
stochastic control problem. Our approach exploits the natural structure of many
such problems, in which significant portions of the system are uncontrolled.
Employing the dynamic programming principle and the mean-field interpretation
of single-hidden layer neural networks, we formulate the control problem as a
series of infinite-dimensional minimisation problems. When regularised
carefully, we provide practically verifiable assumptions for non-asymptotic
bounds on the generalisation error achieved by the minimisers to this problem,
thus ensuring stability in overparametrised settings, for controls learned
using finitely many observations. We explore connections to the traditional
noisy stochastic gradient descent algorithm, and subsequently show promising
numerical results for some classic control problems.