Anytime exponential concentration of contractive stochastic approximation: Additive and multiplicative noise In this talk, we study stochastic approximation (SA) algorithms under a contractive operator with respect to an arbitrary norm. We consider two settings where the iterates are potentially unbounded: additive sub-Gaussian noise, and bounded multiplicative noise. We obtain concentration bounds on the convergence errors, and show that these errors have sub-Gaussian tails. Moreover, our bounds hold anytime in the sense that the entire sample path lies within a tube of decaying radius with high probability. To establish these results, we first bound the Moment Generating Function of the generalized Moreau envelope of the error, which serves as a Lyapunov function. Then, we construct an exponential supermartingale and use Ville's maximal inequality to obtain anytime exponential concentration bounds. To overcome the challenge of having multiplicative noise, we develop a bootstrapping argument to iteratively improve an initially loose concentration bound and obtain a much tighter one. Our results enable us to provide anytime high probability bounds for a large class of reinforcement learning algorithms. Since a special case of contractive SA with multiplicative noise is linear SA with bounded, Hurwitz in expectation, but not almost surely Hurwitz matrices, we establish high probability bounds of various TD-learning algorithms (such as on-policy TD with linear function approximation, and off-policy TD) in one shot. To the best of our knowledge, exponential concentration bounds of off-policy TD-learning have not been established in the literature due to the challenge of handling such multiplicative noise. Moreover, we also provide anytime high probability bounds for the popular Q-learning algorithm. This is joint work with Zaiwei Chen (Caltech) and Siva Theja Maguluri (Georgia Tech) https://www.cmat.edu.uy/eventos/seminarios/seminario-de-probabilidad-y-estadistica/anytime-exponential-concentration-of-contractive-stochastic-approximation-additive-and-multiplicative-noise https://www.cmat.edu.uy/@@site-logo/log-cmat.png

Anytime exponential concentration of contractive stochastic approximation: Additive and multiplicative noise

Dia	2022-09-02 10:30:00-03:00
Hora	2022-09-02 10:30:00-03:00
Lugar	zoom

Anytime exponential concentration of contractive stochastic approximation: Additive and multiplicative noise

Martín Zubeldía (ISyE, University of Minnesota, EEUU)

In this talk, we study stochastic approximation (SA) algorithms under a contractive operator with respect to an arbitrary norm. We consider two settings where the iterates are potentially unbounded: additive sub-Gaussian noise, and bounded multiplicative noise. We obtain concentration bounds on the convergence errors, and show that these errors have sub-Gaussian tails. Moreover, our bounds hold anytime in the sense that the entire sample path lies within a tube of decaying radius with high probability. To establish these results, we first bound the Moment Generating Function of the generalized Moreau envelope of the error, which serves as a Lyapunov function. Then, we construct an exponential supermartingale and use Ville's maximal inequality to obtain anytime exponential concentration bounds. To overcome the challenge of having multiplicative noise, we develop a bootstrapping argument to iteratively improve an initially loose concentration bound and obtain a much tighter one.

Our results enable us to provide anytime high probability bounds for a large class of reinforcement learning algorithms. Since a special case of contractive SA with multiplicative noise is linear SA with bounded, Hurwitz in expectation, but not almost surely Hurwitz matrices, we establish high probability bounds of various TD-learning algorithms (such as on-policy TD with linear function approximation, and off-policy TD) in one shot. To the best of our knowledge, exponential concentration bounds of off-policy TD-learning have not been established in the literature due to the challenge of handling such multiplicative noise. Moreover, we also provide anytime high probability bounds for the popular Q-learning algorithm.

This is joint work with Zaiwei Chen (Caltech) and Siva Theja Maguluri (Georgia Tech)