Atmosphere Ocean Science Colloquium

Multiscale Geometry of High-Dimensional Data, with Applications to Machine Learning and Dynamical Systems

Speaker: Mauro Maggioni, Duke University

Location: Warren Weaver Hall 1302

Date: Wednesday, January 29, 2014, 3:30 p.m.


High dimensional data appears in a wide variety of applications, from signal processing (e.g. sounds, images), to study of dynamical systems with high dimensional state spaces, to the study of corpora of text documents, to medical, biological and financial data. A basic model is to think of a data point as a sample from a high-dimensional probability distribution. Traditional statistical methods fail in high-dimensions due to the curse of dimensionality, and new hypotheses on the structure of data are needed. in particular, the assumption that data, while presented in high-dimensionals, has complex geometric structures that are intrinsically low-dimensional, has been verified in many data sets, and has been useful in deriving new methods in statistics and machine learning. We will discuss techniques that analyze the geometry of data in a robust multiscale fashion (both with respect to sample size and high-dimensional noise) to estimate the intrinsic dimension, efficiently construct representations of data and dictionaries for it, and to estimate the probability distribution generating the data. We present applications to the detection of anomalies in hyper-spectral images, images, in regression and classification problems, as well as in the study of dynamical systems, both low-and high-dimensional (e.g. molecular dynamics).