Stochastic Variational Inequality Methods for Policy Evaluation in Reinforcement Learning

Printable PDF

Department of Mathematics,
University of California San Diego

****************************

Math 278C - Optimization and Data Science

Guanghui Lan

Georgia Institute of Technology

Stochastic Variational Inequality Methods for Policy Evaluation in Reinforcement Learning

Abstract:

In this talk, we discuss a few simple and optimal methods for solving stochastic variational inequalities (VI). A prominent application of our algorithmic developments is the stochastic policy evaluation problem in reinforcement learning. Prior investigations in the literature focused on temporal difference (TD) learning by employing nonsmooth finite time analysis motivated by stochastic subgradient descent leading to certain limitations. These encompass the requirement of analyzing a modified TD algorithm that involves projection to an a-priori defined Euclidean ball, achieving a non-optimal convergence rate and no clear way of deriving the beneficial effects of parallel implementation. Our approach remedies these shortcomings in the broader context of stochastic VIs and in particular when it comes to stochastic policy evaluation. We developed a variety of simple TD learning type algorithms motivated by its original version that maintain its simplicity, while offering distinct advantages from a non-asymptotic analysis point of view. We first provide an improved analysis of the standard TD algorithm that can benefit from parallel implementation. Then we present versions of a conditional TD algorithm (CTD), that involves periodic updates of the stochastic iterates, which reduce the bias and therefore exhibit improved iteration complexity. This brings us to the fast TD (FTD) algorithm which combines elements of CTD and our newly developed stochastic operator extrapolation method. For a novel index resetting stepsize policy FTD exhibits the best known convergence rate. We also devised a robust version of the algorithm that is particularly suitable for discounting factors close to 1.

Host: Jiawang Nie

December 2, 2020

2:00 PM

Zoom Meeting ID: 998 9823 3654 Password: 278CFA20

****************************

Department of Mathematics, University of California San Diego

Math 278C - Optimization and Data Science

Guanghui Lan

Georgia Institute of Technology

Stochastic Variational Inequality Methods for Policy Evaluation in Reinforcement Learning

Abstract:

December 2, 2020

2:00 PM

Zoom Meeting ID: 998 9823 3654 Password: 278CFA20

Department of Mathematics,
University of California San Diego