Printable PDF
Department of Mathematics,
University of California San Diego

****************************

Special Colloquium

Pragya Sur

Stanford University

A modern maximum-likelihood theory for high-dimensional logistic regression

Abstract:

Logistic regression is arguably the most widely used and studied non-linear model in statistics. It has found widespread applicability in varied domains, such as genetics, health care, e-commerce, etc. Classical maximum-likelihood theory for this model hinges on the fundamental results---(1) the maximum-likelihood-estimate (MLE) is asymptotically unbiased (2) its variability can be quantified via the inverse Fisher Information (3) the likelihood-ratio-test (LRT) is asymptotically a Chi-Square. These results are universally used for statistical inference. Our findings reveal, however, when the number of features p and the sample size n both diverge, with the ratio p/n converging to a positive constant, classical results are far from accurate. For a certain class of logistic models, we observe (1) the MLE is biased, (2) its variability is much higher than classically estimated, and (3) the LRT is not distributed as a Chi-Square. We develop a new theory that quantifies the asymptotic bias and variance of the MLE, and characterizes asymptotic distribution of the LRT under certain assumptions on the covariate distribution. Empirical findings demonstrate that our results provide extremely accurate inference in finite samples. These novel results depend on the underlying regression coefficients through a single scalar, the overall signal strength, and we discuss a procedure to estimate this parameter accurately. This is based on joint work with Emmanuel Candes and Yuxin Chen.

Host: Jelena Bradic

December 3, 2018

3:00 PM

AP&M 6402

****************************