Printable PDF
Department of Mathematics,
University of California San Diego


Math 278C: Optimization and Data Science

Prof. Johannes Müller

IMPRS MiS, Leipzig

Geometry of Markov decision processes


We study Markov decision processes (MDPs) in the space of state-action distributions using algebraic and differential geometric methods. We provide an explicit description of the set of feasible state-action distributions of a partially observable problems with memoryless stochastic policies through polynomial constraints. In particular, this yields a formulation of the reward optimization problem as a polynomially constrained linear objective program. This allows us to study the combinatorial and algebraic complexity of the problem and we obtain explicit upper bounds on the number of critical points over every boundary component of the feasible set for a large class of problems. We demonstrate that the polynomial programming formulation of reward optimization can be solved using tools from constrained optimization and applied algebraic geometry, which exhibit stability improvements and provide globally optimal solutions. Further, we study the convergence of several natural policy gradient (NPG) methods with regular policy parametrization. For a variety of NPGs we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows proposed by Kakade and Morimura and co-authors by observing that these arise from the Hessian geometries of conditional entropy and entropy respectively.

Host: Jiawang Nie

February 1, 2023

3:00 PM
Meeting ID: 940 3029 8286
Password: 278CWN23