Printable PDF
Department of Mathematics,
University of California San Diego

****************************

Math 278B: Mathematics of Information, Data, and Signals

Tianhao Wang

UCSD

Adaptive Optimizers: From Structured Preconditioners to Adaptive Geometry

Abstract:

Adaptive optimizers such as Adam and Shampoo are workhorses of modern machine learning, enabling efficient training of large-scale models across architectures and domains. In this talk, we will present a unified framework for adaptive optimizers with structured preconditioners, encompassing a variety of existing methods and introducing new ones. Our analysis reveals the fundamental interplay between preconditioner structures and loss geometries, highlighting in particular that more adaptivity is not always helpful. Furthermore, the dominance of adaptive methods has recently been challenged by the surprising effectiveness of simpler normalized steepest descent (NSD)–type methods such as Muon, while a consensus has emerged that both families of methods succeed by exploiting the non-Euclidean geometry of the loss landscape. Building on the proposed framework, we show that the convergence of adaptive optimizers is governed by a notion of adaptive smoothness, which contrasts with the standard smoothness assumption leveraged by NSD. In addition, although adaptive smoothness is a stronger condition, it enables acceleration via Nesterov momentum, which cannot be achieved under the standard smoothness assumption in non-Euclidean settings. Finally, we develop a notion of adaptive gradient variance that parallels adaptive smoothness and yields qualitatively improved guarantees compared to those based on standard gradient variance.

Host: Alex Cloninger

November 21, 2025

11:00 AM

APM 6402

Research Areas

Mathematics of Information, Data, and Signals

****************************