Printable PDF
Department of Mathematics,
University of California San Diego

****************************

Math 278C: Optimization and Data Science

Prof. Yuhua Zhu

UCSD

Reinforcement learning in the optimization formulation

Abstract:

There are two types of algorithms in Reinforcement Learning (RL): value-based and policy-based. As nonlinear function approximations, such as Deep Neural Networks, become popular in RL, algorithmic instability is often observed in practice for both types of algorithms. One reason is that most algorithms are based on the contraction property of the Bellman operator, which may no longer hold in nonlinear approximation. In this talk, we will introduce two algorithms based on the Bellman residual whose performance is independent of the contraction property of the Bellman operator. In both algorithms, we formulate the RL into an unconstrained optimization problem. The first algorithm is value-based, where we assume the underlying dynamics is smooth. We proposed an algorithm called Borrowing From the Future (BFF), and we proved that it has an exponentially fast convergence rate in model-free control. The second algorithm is policy-based. We proposed an algorithm called variational actor-critic with flipping gradients. We prove that it is guaranteed to converge to the optimal policy when the state space is finite. 

Host: Jiawang Nie

November 9, 2022

3:00 PM

https://ucsd.zoom.us/j/94199223268?pwd=aTI4c3VDNjl4ZjlJak93YzdZYWNzdz09

Meeting ID: 941 9922 3268

Password: 278CF22

****************************