##### Department of Mathematics,

University of California San Diego

****************************

### Math 296 - Graduate Student Colloquium

## Prof. Yuhau Zhu

#### UC San Diego

## A PDE-based Bellman Equation for Continuous-Time Reinforcement Learning

##### Abstract:

In this talk, we address the problem of continuous-time reinforcement learning in scenarios where the dynamics follow a stochastic differential equation. When the underlying dynamics remain unknown and we have access only to discrete-time information, how can we effectively conduct policy evaluation? We first demonstrate that the commonly used Bellman equation is a first-order approximation to the true value function. We then introduce higher order PDE-based Bellman equation called PhiBE. We show that the solution to the i-th order PhiBE is an i-th order approximation to the true value function. Additionally, even the first-order PhiBE outperforms the Bellman equation in approximating the true value function when the system dynamics change slowly. We develop a numerical algorithm based on Galerkin method to solve PhiBE when we possess only discrete-time trajectory data. Numerical experiments are provided to validate the theoretical guarantees we propose.

Host: Jon Novak

### February 14, 2024

### 3:00 PM

Remote Access via Zoom

https://ucsd.zoom.us/j/

****************************