Printable PDF
Department of Mathematics,
University of California San Diego

****************************

Math 278B: Mathematics of Information, Data, and Signals

Sanjoy Dasgupta

UCSD

Recent progress on interpretable clustering

Abstract:

The widely-used k-means procedure returns k clusters that have arbitrary convex shapes. In high dimension, such a clustering might not be easy to understand. A more interpretable alternative is to constraint the clusters to be the leaves of a decision tree with axis-parallel splits; then each cluster is a hyperrectangle given by a small number of features.

Is it always possible to find clusterings that are intepretable in this sense and yet have k-means cost that is close to the unconstrained optimum? A recent line of work has answered this in the affirmative and moreover shown that these interpretable clusterings are easy to construct.

I will give a survey of these results: algorithms, methods of analysis, and open problems.

January 24, 2025

11:00 AM

APM 2402

Research Areas

Mathematics of Information, Data, and Signals

****************************