Printable PDF
Department of Mathematics,
University of California San Diego

****************************

Special Seminar

Richard Olshen

Stanford University

Successive normalization/standardization of rectangular arrays

Abstract:

When each subject in a study provides a vector of numbers/features for analysis, and one wants to standardize, then for each coordinate of the resulting rectangular array one may subtract the mean by subject and divide by the standard deviation by subject. Each feature then has mean 0 and standard deviation 1. Data from expression arrays and protein arrays often come as such rectangular arrays, where typically column denotes "subject" and the other some measure of "gene". When analyzing these data one may ask that subjects and genes "be on the same footing". Thus, there may be a need to standardize across rows and columns of the matrix. We investigate the convergence of a successive approach to standardization, which we learned from colleague Bradley Efron. Limit matrices exist on a Borel set of full measure; these limits have row and column means 0, row and column standard deviations 1. We study implementation on simulated data and data that arose in cardiology. The procedure can be shown not to work with simultaneous standardization. Results make contact with previous work on large deviations of Lipschitz functions of Gaussian vectors and with von Neumann's algorithm for the distance between two closed, convex subsets of a Hilbert space. New insights regarding inference are enabled. Efforts are joint with colleague Bala Rajaratnam and have been helped by conversations with many others.

May 2, 2014

1:00 PM

Leichtag 205

****************************