Printable PDF
Department of Mathematics,
University of California San Diego

****************************

Math 278B - Mathematics of Information, Data, and Signals Seminar

Anna Little

University of Utah

Clustering High-dimensional Data with Path Metrics: A Balance of Density and Geometry

Abstract:

This talk discusses multiple methods for clustering high-dimensional data, and explores the delicate balance between utilizing data density and data geometry. I will first present path-based spectral clustering, a novel approach which combines a density-based metric with graph-based clustering. This density-based path metric allows for fast algorithms and strong theoretical guarantees when clusters concentrate around low-dimensional sets. However, the method suffers from a loss of geometric information, information which is preserved by simple linear dimension reduction methods such as classic multidimensional scaling (CMDS). The second part of the talk will explore when CMDS followed by a simple clustering algorithm can exactly recover all cluster labels with high probability. However, scaling conditions become increasingly restrictive as the ambient dimension increases, and the method will fail for irregularly shaped clusters. Finally, I will discuss how a more general family of path metrics, combined with MDS, give low-dimensional embeddings which respect both data density and data geometry. This new method exhibits promising performance on single cell RNA sequence data and can be computed efficiently by restriction to a sparse graph.

Host: Rayan Saab

December 10, 2020

10:30 AM

zoom link: https://msu.zoom.us/j/96421373881 (passcode: the first prime larger than 100)

****************************