My research focus is developing and analyzing novel methods for finding structure in noisy, high-dimensional data using tools from probability, random matrix theory, graph theory, linear algebra, harmonic analysis, and machine learning. Modern data sets often have an enormous number of features, with each observation taking values in a high-dimensional Euclidean space, and yet the data contains an underlying structure that is low-dimensional. This low-dimensional structure may arise because all of the sample points lie on a low-dimensional subspace or manifold, or because the data is well separated into distinct clusters under some metric. My research involves representing this low-dimensional structure with an appropriate data model and then constructing algorithms that can correctly extract the low-dimensional structure with high probability. This process involves careful analysis of noise, sampling, and the effects of the data dimension, in order to quantify in which regimes one can successfully extract the low-dimensional structure. In order to improve the state-of-art in data analysis, these algorithms must be computationally efficient in addition to accurate. Thus an important component of my research is developing fast numerical implementations which minimize the dependence on the ambient dimension and are log linear in the sample size. Although I am a mathematician by training, I also pursue inter-disciplinary collaborations where I can utilize tools from machine learning in domain specific areas including cybersecurity and molecular biology.
Estimation of Intrinsic Dimensionality of Samples from Noisy Low- dimensional Manifolds in High Dimensions with Multiscale SVD
J Lee, A Little, Y Jung, M Maggioni. 15th IEEE Workshop on Statistical Signal Processing (SSP), Cardiff, 2009.