##### Department of Mathematics,

University of California San Diego

****************************

## Purnamrita Sarkar

#### UT Austin

## Some new results for streaming principal component analysis

##### Abstract:

While streaming PCA (also known as Oja’s algorithm) was proposed about four decades ago and has roots going back to 1949, theoretical resolution in terms of obtaining optimal convergence rates has been obtained only in the last decade. However, we are not aware of any available distributional guarantees, which can help provide confidence intervals on the quality of the solution. In this talk, I will present the problem of quantifying uncertainty for the estimation error of the leading eigenvector using Oja's algorithm for streaming PCA, where the data are generated IID from some unknown distribution. Combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and concentration of matrix products, we establish a distributional approximation result for the error between the population eigenvector and the output of Oja's algorithm. We also propose an online multiplier bootstrap algorithm and establish conditions under which the bootstrap distribution is close to the corresponding sampling distribution with high probability. While there are optimal rates for the streaming PCA problem, they typically apply to the IID setting, whereas in many applications like distributed optimization, the data is generated from a Markov chain and the goal is to infer parameters of the limiting stationary distribution. If time permits, I will also present our near-optimal finite sample guarantees which remove the logarithmic dependence on the sample size in previous work, where Markovian data is downsampled to get a nearly independent data stream.

*Note:* The speaker will also give an overview of her latest research.

### October 2, 2023

### 2:00 PM

HDSI Building, room 123

****************************