Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

Printable PDF

Department of Mathematics,
University of California San Diego

****************************

278B - Mathematics of Information, Data, and Signals

Hedrick Assistant Adjunct Prof. Michael Murray

UCLA

Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

Abstract:

In this talk I’ll discuss recent work studying benign overfitting in two-layer ReLU networks trained using gradient descent and hinge loss on noisy data for binary classification. Unlike logistic or exponentially tailed losses the implicit bias in this setting is poorly understood and therefore our results and techniques are distinct from other recent and concurrent works on this topic. In particular, we consider linearly separable data for which a relatively small proportion of labels are corrupted and identify conditions on the margin of the clean data which give rise to three distinct training outcomes: benign overfitting, in which zero loss is achieved and with high probability test data is classified correctly; overfitting, in which zero loss is achieved but test data is misclassified with probability lower bounded by a constant; and non-overfitting, in which clean points, but not corrupt points, achieve zero loss and again with high probability test data is classified correctly. Our analysis provides a fine-grained description of the dynamics of neurons throughout training and reveals two distinct phases: in the first phase clean points achieve close to zero loss, in the second phase clean points oscillate on the boundary of zero loss while corrupt points either converge towards zero loss or are eventually zeroed by the network. We prove these results using a combinatorial approach that involves bounding the number of clean versus corrupt updates across these phases of training.

Department of Mathematics,
University of California San Diego

278B - Mathematics of Information, Data, and Signals

Hedrick Assistant Adjunct Prof. Michael Murray

UCLA

Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

Abstract:

October 26, 2023

11:30 AM

APM 2402

Department of Mathematics, University of California San Diego

278B - Mathematics of Information, Data, and Signals

Hedrick Assistant Adjunct Prof. Michael Murray

UCLA

Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

Abstract:

October 26, 2023

11:30 AM

APM 2402

Department of Mathematics,
University of California San Diego