Printable PDF
Department of Mathematics,
University of California San Diego

****************************

Math 288: Probability & Statistics

Haixiao Wang

University of Wisconsin

Spectral Embeddings via Random Geometric Graphs for Noisy, High-Dimensional, and Nonlinear Datasets with Applications

Abstract:

Clustering is one of the fundamental problems in statistics and machine learning. Classical generative models such as the Stochastic Block Model (SBM) and Gaussian Mixture Model (GMM) are widely used for synthetic data generation and theoretical evaluation, but much of the literature assumes linearly separable clusters---an assumption that can fail in the presence of nonlinear geometry. We study a nonlinear multi-manifold model in which disjoint manifolds represent different clusters and the observations are corrupted by high-dimensional noise. We propose a kernel-based spectral embedding algorithm, based on the Random Geometric Graph (RGG) constructed from the data. Following the framework established by Ding and Ma (2023), we show that the embedding converges to its noiseless counterpart when the signal-to-noise ratio is sufficiently large. For downstream tasks, the embedding can be used for community detection problems. When different manifolds are sufficiently separated, the procedure recovers the community structure with vanishing error. Based on joint work with Xiucai Ding.

April 23, 2026

11:00 AM

APM 6402

Research Areas

Probability Theory

****************************