Neyman-Pearson classification

Printable PDF

Department of Mathematics,
University of California San Diego

****************************

Statistics Seminar

Xin Tong

Marshall School of Business, University of Southern California

Abstract:

In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, alpha, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than alpha do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than alpha. This talk introduces the speaker's work on NP classification algorithms and their applications and raises current challenges under the NP paradigm.

Department of Mathematics,
University of California San Diego

Statistics Seminar

Xin Tong

Marshall School of Business, University of Southern California