Research Interests

My research uses tools from harmonic, functional, and numerical analysis to create new methods to better understand and accomplish data analysis tasks. I am interested in the theory and algorithmic aspects of finding low-dimensional structure within high-dimensional data. Conversely, I am also interested in dimensionality reduction methods which embed high-dimensional data in low dimensions with a goal of doing so faithfully and in an interpretable way. These interests have lead me to work with collaborators in many disciplines on problems like Subspace Clustering, Graph Sketching, and Low-rank Matrix Approximations. I am also interested in signal processing, including studying function spaces to model classes of signals, and have worked extensively on the interaction between high-dimensional Sampling Theory and Radial Basis Function approximation schemes. This page shows some high level overviews of some specific ongoing projects. The theme tying them together is the aim to eventually be able to learn, in a data-driven way, unions of manifolds appearing within large-scale data. For a full list of my publications, see my publication page.

For an explanation of my research which is accessible to a general audience, see the recent UofA spotlight.

Subspace Clustering

The Subspace Clustering Problem is an unsupervised Machine Learning problem in which data in a high-dimensional Euclidean space is assumed to actually come from the union of low-dimensional affine subspaces, and the goal is to find a partition of the data for which each partition (cluster) corresponds to the data from a given subspace. Subsequently, one can find a basis for the subspace. There are many applications in which data naturally fits this framework including motion segmentation, particle picking in cryo-electon microscopy, and object recognition under different illumination patterns.

With Akram Aldroubi (Math, Vanderbilt University), Ali Sekmen (Computer Science, Tennessee State University), and Bugra Koku (Mechanical Engineering, Middle East Technical University), I proved that CUR decompositions lead to a solution of the Subspace Clustering problem, and we gave an algorithm which is robust to noise and achieved the best performance to date on the Hopkins155 motion segmentation dataset.

CUR Decompositions

CUR decompositions are relatively little-known matrix factorizations obtained by choosing column and row submatrices from a given matrix. Longxiu Huang (Math, UCLA) and I have characterized CUR decompositions and done a thorough perturbation analysis for variants of this factorization. Additionally, we give sampling guarantees which show when randomly sampling column and row submatrices yield a CUR decomposition of a low-rank matrix with high probability.

Graph Sketches

Graph sketches are subgraphs of a large graph which approximately maintain some property of the initial graph, such as pairwise distances between vertices. In this case, the sketch is called a graph spanner. With Stephen Kobourov’s group at University of Arizona along with Greg Bodwin, I have written a survey of graph spanners.

We have also introduced the subject of multi-level graph spanners, and considered several algorithmic solutions to the problem. All Manifold Learning algorithms include a graph sparsification step to approximate geodesic distances; we hope to apply spanner techniques in this arena in the near future.

Quasi Shift-invariant Spaces

Sampling theory asks the question: given a class of functions, if we have access to a set of discrete measurements for a given function f , is it possible to reconstruct f from these measurements? If so, how can this reconstruction be done in a computationally efficient manner?

Modern sampling theory has attempted to go as far beyond Paley–Wiener (or bandlimited) signals as possible, and my research has been on interpolation of smooth functions (Sobolev or Triebel–Lizorkin classes) by approximants in quasi shift-invariant spaces, which are relatively unstructured generalizations of bandlimited signals.